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I. REAL PARTY IN INTEREST 

The Assignee of the present application is Sumitomo Chemical Company, Ltd. of Osaka, 

Japan. 
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II. RELATED APPEALS, INTERFERENCES, AND JUDICIAL PROCEEDINGS 

An Appeal Brief was filed in the copending application no. 08/992,914 on December 26, 
2006. An Examiner's Answer has been received and a Reply Brief filed July 16, 2007 in that 
application. 

The copending '914 application is directed to similar subject matter as the present 
application and the appeal is of the same ground of rejection. That is, both the '914 and the 
present application present for appeal the question of whether or not a certain degree of sequence 
identity of a gene or protein sequence is sufficient to establish by the preponderance of the 
evidence an asserted utility for an invention, and corresponding adequacy of written description 
and enablement of "how to use" the invention. 
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III. STATUS OF CLAIMS 

The following is the status of the claims as of the mailing of the Final Office Action on 
August 23, 2006: 

Claims 1,4-10, 16-23, 28 and 29 are pending in the application. 

Claims 6 and 7 are indicated as "objected to" in the Office Action Summary and 
indicated as "allowed" in the Conclusion of the Office Action (paragraph 6 on page 9). Claims 6 
and 7 are independent claims and so Appellants suppose that the correct status of claims 6 and 7 
is "allowed." 

The Examiner's decision rejecting claims 1, 4, 5, 8-10, 16-23, 28 and 29 has been 
appealed. 

Claims 1, 4, 5, 8-10, 16-23, 28 and 29 stand rejected under 35 U.S.C. § 112, first 
paragraph, for alleged lack of written description support in the specification and also for alleged 
lack of enablement by the disclosure of the specification. 
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been filed pursuant to the Final Office Action 
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V. SUMMARY OF CLAIMED SUBJECT MATTER 

References to page numbering of the specification are made herein to the specification as 
originally filed. 

Claim 1 is directed to an isolated nucleic acid that comprises a polynucleotide encoding 
an enzyme (raffmose synthase, RFS) that binds a D-galactosyl group through the a(l->6) bond to 
the hydroxyl group attached to the carbon atom at 6-position of the D-glucose residue in a 
sucrose molecule to form raffinose. The claimed nucleic acid comprises a polynucleotide 
comprising a nucleotide sequence selected a group of eight (8) sequences variously described as 
follows: 

(a) a nucleotide sequence encoding the amino acid sequence as depicted in SEQ ID 

NO: 3, 

(b) a nucleotide sequence depicted by the 236th to 2584th nucleotides in the 
nucleotide sequence as depicted in SEQ ID NO: 4, 

(c) a nucleotide sequence encoding the amino acid sequence as depicted in SEQ ID 

NO: 5, 

(d) a nucleotide sequence depicted by the 134th to 2467th nucleotides in the 
nucleotide sequence as depicted in SEQ ID NO: 6, 

(e) a nucleotide sequence encoding the amino acid sequence as depicted in SEQ ID 

NO: 7, 

(f) a nucleotide sequence depicted by the 1st to 1719th nucleotides in the nucleotide 
sequence as depicted in SEQ ID NO: 8, 



(g) a nucleotide sequence obtained from a polynucleotide which is amplified from a 
nucleic acid obtained from beet with a combination of a PGR primer selected from the group 
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consisting of SEQ ID NO: 1 1 and SEQ ID NO: 13 and a PGR primer selected from the group 
consisting of SEQ ID NO: 12 and SEQ ID NO: 14, wherein said nucleotide sequence hybridizes 
with a nucleotide sequence complementary to the nucleotide sequence of (a) or (b), in a buffer 
comprising 0.9M NaCl and 0.09M citric acid at 65°C to 68°C, and 

(h) a nucleotide sequence obtained from a polynucleotide which is amplified from a 
nucleic acid obtained from mustard or rapeseed with a combination of a PGR primer selected 
from the group consisting of SEQ ID NO: 15, SEQ ID NO: 17 and SEQ ID NO: 19 and a PGR 
primer selected from the group consisting of SEQ ID NO: 16, SEQ ID NO: 18 and SEQ ID NO: 
20, wherein said nucleotide sequence hybridizes with a nucleotide sequence complementary to 
the nucleotide sequence of any one of (c) to (f), in a buffer comprising 0.9M NaGl and 0.09M 
citric acid at 65°C to 68^0. 

Support for the description of the encoded enzyme as a raffinose synthase having the 
recited biochemical activity is provided in the specification at, e.g. page 2, lines 7-13 and also at 
page 31, line 22 to page 32, line 4. The latter text also describes by citation to the literature 
(Lehle et al.) an assay for raffinose synthase activity. 

The various polynucleotides (a)-(f) are set forth in the specification at least at page 4, line 
19 to page 5, line 13 (numbered as 3. to 9.). The particular sequences are set forth in the 
Sequence Listing originally filed (and re-filed with a Preliminary Amendment on April 29, 
1999). 

The polynucleotide (g) is described in Example 4 beginning at page 42, line 10 (as to 
isolation of a cDNA from beet using PGR). The particular primers of SEQ ID NOS: 11-14 are 
described in "List 2" at page 13, line 12 and use of these primers to obtain a full length coding 
sequence of a raffinose synthase cDNA from beet is described at page 53, lines 1-9. The 
particular Sequence Listing Identifiers correspond to sequences in the Sequence Listing. 
Hybridization conditions recited in the claim are set forth at page 1 8, lines 5-10. 
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The polynucleotide (h) is described at, e.g. Examples 6 and 7 beginning at page 45, line 
15. The particular primers recited in the claim are described in "List 3" at page 15, line 22 to 
page 16, line 3, and these primers are described as useful for isolating a cDNA of the complete 
coding portion of a raffmose synthase gene from mustard or rapeseed at page 53, line 14 to page 
54, line 14. The particular Sequence Listing Identifiers correspond to sequences in the Sequence 
Listing. Hybridization conditions recited in the claim are set forth at page 18, lines 5-10. 

Claims 4-10 are directed to particular isolated nucleic acids having specifically recited 
nucleotide sequences or encoding specifically recited amino acid sequences among the 
polynucleotides (a) to (f) of claim 1 . 

Claims 16-20 are directed to an isolated nucleic acid comprising a nucleic acid of claim 
1, operatively linked to a promoter, vectors comprising a nucleic acid of claim 1 and 
transformants in which a nucleic acid of claim 1 or such nucleic acid linked to a promoter has 
been introduced into a host cell. These embodiments are described at, e.g. page 6, line 15 to 
page 7, line 2. 

Claims 28 and 29 describe the promoter as one operative in a plant cell or in a yeast cell, 
respectively. Such promoters are described at, e.g. page 29, line 22 to page 30, line 8. 

Claims 21 and 22 describe the host cell as a microorganism or plant cell. These 
embodiments are described at, e.g. page 7, lines 3-6 and page 33, line 17. 

Claim 23 is directed to a method for producing raffmose synthase by culturing or 
growing the transformant of claim 18 to produce the raffinose synthase, and collecting the 
raffmose synthase so produced. Such is described at, e.g. page 32, lines 4-20. Purification of 
raffinose synthase from plants is known in the art, for example as described by Lehle et al. cited 
at the bottom of page 3 1 . 
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VI. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

The following grounds of rejection are to be reviewed on appeal: 

Claims 1, 4, 5, 8-10, 16-23, 28 and 29 stand rejected under 35 U.S.C. § 112, first 
paragraph, for alleged lack of written description support in the specification and also for alleged 
lack of enablement by the disclosure of the specification. (As stated in the Final Office Action at 
paragraph 4 on page 2 and at paragraph 5 at page 6.) 
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VII. ARGUMENT 

VIIA. Rejections Under 35 U.S. C. § 112, first paragraph - written description 
VIIA.l. Claim 1 

Claim 1 is rejected under 35 U.S.C. § 112, first paragraph, for alleged lack of written 
description support of the claimed invention. Appellants respectfully submit that this rejection 
should be reversed. 

In the Final Office Action of August 23, 2006, the Examiner makes a few different 
assertions regarding the written description requirement. First he states that, as to SEQ ID NO: 7 
(note polynucleotides (e) and (f)), description of an incomplete coding sequence does not 
describe a nucleic acid sequence encoding a raffmose synthase enzyme and that the specification 
includes no description of the portions of the amino acid sequence necessary for raffmose 
synthase activity. Second, as to nucleic acids isolated from "beef or "mustard or rapeseed" 
(note polynucleotides (g) and (h)), the Examiner asserts that, "merely describing a method by 
which a nucleic acid may be isolated does not describe the nucleic acid encoding a raffmose 
synthase as asserted by Applicant." Third, as to SEQ ID NO: 3, and presumably applicable to all 
of polynucleotides (a), (b) and (e) through (h)', the Examiner asserts that a demonstration of the 
biological activity (and thus of utility) for a protein of a particular amino acid sequence cannot be 
used to support an assertion of similar activity for a protein of similar sequence. 

There are no "bright line" tests for whether or not a specification provides adequate 
written description of a claimed invention. The Examiner must carefully review the claims, and 
carefully review the specification to determine whether, in view of what is known in the art at the 
time the application was filed, the specification provides evidence that the inventor was in 
"possession" of the invention as claimed. Capon v. Eshhar, 76 USPQ2d 1078 (Fed. Cir. 2005); 
Faulkner v. Inglis. 79 USPQ2d 1001 (Fed. Cir. 2006). 



' The polynucleotides of (c) and (d) are allowed as claims 6 and 7. 
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As to the Examiner's first assertion, SEQ ID NO: 7 (encoded by nucleotides 1 to 1719 of 
SEQ ID NO: 8) is indeed only a partial sequence of a raffinose synthase; about 25% of the full- 
length sequence is missing from the amino-terminal end. However, the instant claim 1 does not 
recite that the claimed polynucleotide "consists of the recited sequence. Rather, the claim 
recites that the polynucleotide "comprises" the recited polynucleotide, and hence also includes 
any amino acids necessary to complete an amino acid sequence of a raffinose synthase protein. 
Two such complete amino acid sequences are disclosed in the present application as SEQ ID 
NOS: 3 and 5. Methods for determining the complete nucleotide sequence of a cDNA encoding 
raffinose synthase from rapeseed are explained in the specification, as used to obtain complete 
sequences are obtained for examples from beet and mustard. Alternatively, one of ordinary skill 
in the art might simply obtain the missing portion of the enzyme from the complete cDNAs for 
these two proteins that are described (SEQ ID NOS: 2 and 4). The Board is reminded that claim 
1 specifically includes as a feature that the encoded protein exhibit a recited enzymatic activity, 
and so inoperable embodiments are excluded from the claim. 

From the above, it is clear that the specification evidences that the inventors had in their 
possession the invention claimed in claim 1 , parts (e) and (f). The Examiner has merely stated a 
summary conclusion, parroting guidelines to the effect that the specification must set forth an 
explicit "structure-function relationship"^ used by the USPTO to implement a policy restricting 
cloned gene inventions to specifically disclosed species, rather than carefully considering the 
facts presented by the instant application and claims as required by Federal Circuit case law. 

Notwithstanding the failure of the Examiner to carefully consider the facts of the present 
application, he is simply wrong that the specification does not explain parts of the RFS sequence 
that should be preserved for activity. The specification explains that certain portions of the 
amino acid sequence of a RFS should be constrained to high homology to SEQ ID NO: 3 or to 
SEQ ID NO: 5. See, pp. 20-21, indicating portions of high homology (accounting for both 
sequences) from amino acids 103 to 213, 255 to 275, 289 to 326 and 609 to 696. 



^ See, pp. 3-4 of the Office Action mailed March 1, 2005, referenced in the Office Action of December 2, 2005, as 
the "previous Office Action" addressing this issue. 
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As to the Examiner's second assertion, Federal Circuit case law makes abundantly clear 
that it is permissible for an Applicant to claim an invention in product-by-process terms. See, 
e.g. Enzo Biochem, Inc. v. Gen-Probe, Inc., 63 USPQ2d 1609 (Fed. Cir. 2002) and Fiers v. 
Revel, 25 USPQ2d 1601, 1605 (Fed. Cir. 1993). The Examiner's position with respect to parts 
(g) and (h) of claim 1 is simply complete legal error. 

The Examiner's third argument is in the first place more related to the issue of 
enablement of the utility of the invention than to the written description of the structure of the 
invention. More substantively, the Examiner's third argument is contrary to the substantial 
evidence in the record of the present application. The present specification describes in detail a 
method for cloning raffinose synthase (RFS) genes from plants of broadly diverse genera 
{Glycine, Beta, Brassica). The specification describes using sequence information of a cDNA 
encoding part of a RFS from Glycine max (SEQ ID NO: 2) to prepare a set of PCR primers that 
will hybridize to a degenerate set of sequences'^ that are used to amplify mRNA obtained from 
other plants and so isolate fragments of RFS cDNA. These initial amplification products are 
sequenced, then that further data are used to prepare a new set of primers specific for RFS for the 
particular plant being studied. The second set of primers is used to make new amplification 
products that are cloned and from which the complete sequence of the cDNA is obtained {see, 
the Examples 1-6 of the specification). This approach was used three times in working examples 
of the present specification to successfully obtain RFS cDNAs from three different plants. The 
complete coding portions of the cDNA for Beta vulgaris and Brassica juncea (SEQ ID NOS: 4 
and 6, respectively) and part of the coding portion of a cDNA from Brassica napus (SEQ ID NO: 
8) are presented. Appellants have presented evidence in the form of a Declaration of Dr. 
Watanabe that demonstrates unequivocally that a protein having the amino acid sequence of SEQ 
ID NO: 5 has biological activity of a RFS, and this is not disputed by the Examiner'^. Therefore, 
plainly the approach described in the specification can be used successfully to isolate a cDNA 
encoding RFS from diverse genera of plants. 



^ The primers used for initial amplification include degenerate positions or inosine bases that recognize A/G or C/T 
alternatives. 

Claims 6 and 7, directed to this embodiment (also represented by (c) and (d) in claim 1) are allowed. 
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The Examiner asserts that one of ordinary skill in the art is not able to distinguish RFS 
enzymes from the closely related stachyose synthase enzymes, and therefore the actual 
biochemical activity of the proteins encoded by the cDNAs of SEQ ID NOS: 4 and 8 remain 
unknown. Accordingly, the Appellants are asserted to have provided only one example of a 
RFS-encoding cDNA and therefore, except for the claims directed to the species having proven 
activity, the written description of the invention is inadequate. 

RFS enzymes are a subfamily of enzymes grouped together with the subfamily of 
stachyose synthases (STS) in a family of glycoside hydrolase enzymes. Appellants have 
presented substantial evidence that one of ordinary skill in the art can distinguish RFS from STS 
members of the glycoside hydrolase family. This evidence is in the form of the data in Tables 1 
and 2 and Figure 1 presented with Appellants' Amendment filed February 11, 2004 and in Table 
3 and Exhibit 1 presented with Appellants' Amendment filed November 15, 2004. These data, 
and the Exhibit supporting the robustness of the analysis, show that RFS enzymes are 
distinguishable from STS enzymes by determination of the degree of sequence identity to SEQ 
ID NO: 1, 3, 5 or 7 according to the present specification. In particular, the data show that RFS 
enzymes among themselves are at least 50% identical at the amino acid level, and that STS 
enzymes are similarly homologous to each other (actually a bit more so, about 65%). However, 
the degree of identity between RFS and STS enzymes is at most about 45%. Sequence identity 
analysis permits the artisan of ordinary skill to illustrate the distinction between RFSs and STSs 
by a "dendrogram", as shown in Figure 1 attached to the Amendment filed February 1 1 , 2004. 

Furthermore, Appellants have argued that the specification of the copending application 
08/992,914, which discloses additional examples of RFS cDNAs cloned essentially in the 
manner described in the present specification, provides another example in which biological 
activity of a RFS cDNA^ is demonstrated, and that this demonstration further evidences the 
effectiveness of the methods described in the present specification in obtaining cDNAs encoding 
RFS proteins. (See, Appellants amendment of June 2, 2006, at page 8, lines 3 ff.) The Examiner 



from Vincafaba, SEQ ID NO: 1 of the '914 application. 
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has so far dismissed this further evidence because, "each application is to be considered on its 
own merits." {See, page 5, lines 17-18 of the Final Office Action.) That may be the case, but the 
facts of the existence of the '914 application and its working examples may be considered as 
evidence in this application. 

In order to meet the requirements for adequate written description, the specification must 
provide evidence that the inventors "possessed" the invention as claimed at the time the 
application was filed. Vas-Cath v. Murhurkar 19 USPQ2d 1111 (Fed. Cir. 1991). The 
evidentiary standard that must be met by Appellants is only that of the preponderance of the 
evidence. See, In re Oetiker, 24 USPQ2d 1443, 1444 (Fed. Cir. 1992). The Examiner seems to 
be improperly requiring that Appellants meet a higher evidentiary burden, i.e. "clear and 
convincing" evidence or even "beyond reasonable doubt." 

Appellants submit that the evidence of record in the present application firmly establishes 
that it is "more likely than not" that all of the sequences disclosed in the present application are 
those of RFS enzymes. This has been unequivocally established by biochemical assay for one 
disclosed sequence, and sequence similarity as analyzed by one of ordinary skill in the art 
establishes that it is more likely than not true for the others. Furthermore, the same approach for 
cloning RFS-encoding cDNAs used in the present application has been further applied by the 
Appellants, as described in a copending application, and yet a further demonstration that the 
approach obtains cDNA encoding an RFS enzyme, as determined by assay of another expressed 
cDNA, has been made. 

Appellants submit that the present specification, by showing reduction to practice of four 
species of the claimed invention obtained from three diverse genera of plants, adequately 
evidences that the inventors "had possession" of the claimed invention at the time the present 
application was filed. Accordingly, the decision of the Examiner that claim 1 is not supported by 
adequate written description in the specification should be reversed . 
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VIIA.2. Claims 4 and 5 

Claim 4 of the present application is directed to an isolated nucleic acid comprising a 
nucleotide sequence encoding the amino acid sequence of SEQ ID NO: 3. Claim 5 recites a 
specific polynucleotide sequence of SEQ ID NO: 4 from which SEQ ID NO: 3 is derived. SEQ 
ID NO: 3 is the amino acid sequence of the complete coding region of a cDNA obtained from a 
Chenopdiaceae plant (Beta vulgaris) in Examples 3 and 4. As the sequence is of a complete 
protein coding portion of the cDNA, the rejection of these claims is based only upon the 
argument of the Examiner regarding the evidence that the encoded enzyme has RFS activity. 

Appellants have explained above that there is substantial evidence in the record that 
establishes at least to the preponderance of the evidence standard that the amino acid sequence of 
SEQ ID NO: 3, encoded by the cDNA of SEQ ID NO: 4, represents a protein having RFS 
activity. The particular amino acid sequences in question were determined by a cloning method 
generally accepted in the art as useful for cloning functionally homologous proteins across 
species lines. 

Finally, claims 4 and 5 recite specific sequences at either the nucleotide or amino acid 
level. In either case, the skilled artisan can readily determine the exact structure or family of 
structures encompassed by the claims and so there is no question that the inventors "possessed" 
the inventions described in these two claims. 

For all of the above reasons, the rejection of claims 4 and 5 under 35 U.S.C. § 1 12, first 
paragraph, for alleged lack of adequate written description in the specification, should be 
reversed . 

VIIA.3. Claims 8 and 9 

Claim 8 of the present application is directed to an isolated nucleic acid comprising a 
nucleotide sequence encoding the amino acid sequence of SEQ ID NO: 7. Claim 9 recites a 
specific polynucleotide sequence of SEQ ID NO: 8 from which SEQ ID NO: 7 is derived. SEQ 
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ID NO: 7 is the amino acid sequence of part of the coding region of a cDNA obtained from a 
Cruciferae plant {Brassica napus) in Examples 6 and 7. 

Among the arguments presented by the Examiner against claim 1, only the issues of a 
partial sequence and the sufficiency of the evidence that a RFS enzyme is encoded are applicable 
to claims 8 and 9. 

Appellants have explained above that claims 8 and 9 are directed to nucleic acids 
comprising the recited sequences, and thus may include additional nucleotides encoding further 
amino acids as may be necessary to provide an enzyme having RFS activity. Appellants have 
explained previously that the specification provides description of two complete RFS enzyme 
amino acid sequences, and that these data can be used to prepare the additional sequences for 
attachment to the partial sequence of SEQ ID NO: 7. The specification also describes regions of 
high homology that should be present in an enzyme having RFS activity. For the convenience 
of the Board, Appellants present as Exhibit 4 attached hereto an alignment of the amino acid 
sequences of SEQ ID NO: 7 (sc-07) at issue with SEQ ID NO: 5 (sc-05), which represents a 
protein demonstrated by biochemical assay to have activity as a raffinose synthase.^ In the 
alignment, identical amino acids are shown by *. The regions of high homology within raffinose 
synthases described in the specification are indicated by the shaded portions of the sequence. 
Appellants submit that the missing 4% of that region (or for that matter the entirety of the 
missing amino-terminal portion) may be supplied by the corresponding amino-terminal end 
sequences of SEQ ID NO: 3 or 5 as desired by the practitioner of the invention. 

Appellants have also explained above that the evidence of record in the present 
application is sufficient, at least to the standard of the preponderance of the evidence, to establish 
that the amino acid sequence of SEQ ID NO: 7 is that of a RFS enzyme. 



^ Appellants submit that Exhibit 4 does not constitute "new evidence" as it merely presents data actually present in 
the record in the form of SEQ ID NOS: 5 and 7 and otherwise described in the specification. However, Exhibit 4 
presents this information in a different format convenient for consideration by the Board. 
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Finally, claims 8 and 9 recite specific sequences at either the nucleotide or amino acid 
level. In either case, the skilled artisan can readily determine the exact structure or family of 
structures encompassed by the claims and so there is no question that the inventors "possessed" 
the inventions described in these two claims. 

For the reasons above, the decision of the Examiner rejecting claims 8 and 9 under 35 
U.S.C. § 112, first paragraph, for lack of adequate written description support by the 
specification, should be reversed . 

VIIA.4. Claim 10 

Claim 10 is directed to an isolated nucleic acid comprising the entirety of SEQ ID NO: 4, 
SEQ ID NO: 6 or SEQ ID NO: 8. The scope of claim 10 differs slightly from that of claims 5, 7 
and 9 in that the entire length of the recited nucleotide sequence is recited in claim 10. In 
contrast, claims 5, 7 and 9 recite the coding portions of the nucleotide sequences. 

Appellants's arguments above apply to claim 10 as well. 

Claim 10 recites a group of specific structures that are expressly stated in the Sequence 
Listing as originally filed. There can be no doubt that the specification describes these sequences 
exactly and so no doubt that claim 10 meets the requirement for written description of the 
claimed invention. 

For the reasons above, the decision of the Examiner rejecting claim 10 under 35 U.S.C. § 
1 12, first paragraph, for lack of adequate written description support by the specification, should 
be reversed . 

VIIA.5. Claims 16-23, 28 and 29 

Claims 16-23, 28 and 29 are dependent ultimately from claim 1 and stand rejected for the 
same reasons as claim 1 is rejected. These dependent claims are directed to embodiments of the 
invention in which a nucleic acid providing a sequence encoding a RFS enzyme is operatively 
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linked to a promoter (claim 1 6) or placed into a vector (claim 1 7) or to a transformed host cell 
comprising the nucleic acid of claim 1 either per se, or as part of a promoter-structural gene 
construct or as part of a vector (claims 18, 19 and 20, respectively). Claims 22, 23, 28 and 29 
further define the nature of the host cell or the nature of the promoter, respectively. 

The Examiner has so far presented no reason for rejection of claims 16-23, 28 and 29 
independent from the rejection of claim 1. Thus, the Board is respectfully requested to consider 
that, should the decision of the Examiner with respect to any part (a) through (h) of claim 1 be 
reversed, the dependent claims 16-23, 28 and 29 should be indicated as allowable if rewritten to 
recite the allowable part of claim 1 . 

VIIB. Rejections under 35 U.S. C. §112, first paragraph ~~ enablement 

Claims 1, 4, 5, 8-10, 16-23, 28 and 29 stands rejected under 35 U.S.C. § 112, first 
paragraph, for alleged lack of enablement of the claimed invention by the disclosure of the 
specification. The Examiner's position on this issue is essentially that grouping of enzyme 
primary structures into families based upon sequence identity is insufficient to support an 
assertion that a protein of unconfirmed activity will have the activity ascribed to that family 
demonstrated by biochemical assay of at least one of its members. The Examiner therefore 
asserts that the present specification is enabling of "how to use the invention" only for those 
proteins for which activity as a raffinose synthase is actually demonstrated by biochemical assay. 
In the present instance, the Examiner asserts that the claims must be limited with respect to the 
amino acid sequence of the enzyme to SEQ ID NO: 6, which is the sole amino acid sequence 
described in the present specification for which raffinose synthase activity has been actually 
demonstrated. 

Appellants disagree. 
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VIIB.l. Claim 1 

The question of enablement is to be considered under a multifactor analysis as set forth in 
In re Wands, 8 USPQ2d 1400 (Fed. Cir. 1988). It is incumbent upon the Examiner to first 
establish a prima facie case for lack of enablement. In re Wright, 27 USPQ2d 1510, 1513 (Fed. 
Cir. 1993) (holding examiner must provide a reasonable explanation as to why the scope of 
protection provided by a claim is not adequately enabled by the disclosure). Should the 
Examiner do so, then the Appellant must establish, by the preponderance of the evidence, that no 
undue experimentation is required to practice the invention as claimed. In re Oetiker, 24 
USPQ2dat 1444. 

Appellants first submit that the Examiner has never established a proper prima facie case 
for lack of enablement of the claimed invention. Proper analysis of the question of enablement 
requires that the factors of 1) the breadth of the claims, 2) the nature of the invention, 3) the level 
of ordinary skill in the art, 4) the amount of experimentation needed, 5) the state of the art at the 
time the invention was made, 6) the amount and quality of guidance provided by the 
specification, 7) the presence or absence of working examples and 8) the predictability in the art. 
Of these factors, the Examiner repeatedly has only addressed the predictability in the art. The 
Examiner's position is that, because there is evidence in the record for RFS activity only for a 
protein of amino acid sequence of SEQ ID NO: 5, and the degree of sequence identity among the 
amino acid sequences identified in the working examples is as low as 50%, Appellants cannot 
reliably assign the biochemical activity of a raffinose synthase to the amino acid sequences of 
SEQ ID Nos: 3 and 7. 

Applicants note first that analysis of enablement is a question of whether "undue 
experimentation" is required to practice the invention throughout its claim scope. Consideration 
of the question of undue experimentation is by weighing all of several factors enumerated in In 
re Wands, 8 USPQ2d 1400 (Fed. Cir. 1988). 

The Examiner fails to meet his burden of establishing a prima facie lack of enablement. 

The Examiner's analysis of the question of undue experimentation looks only at the factor of 
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whether working examples of the claimed invention are described in the specification and an 
assertion that it is unpredictable whether any particular nucleic acid produced according to the 
teachings of the invention would in fact exhibit raffmose synthase activity. This analysis is 
legally insufficient to establish prima facie lack of enablement, as the Examiner fails to consider 
the breadth of the claims, the nature of the invention, the level of ordinary skill in the art, the 
quantity of the experimentation needed, the guidance provided by the specification (other than 
the presence or absence of working examples) and the state of the art at the time the invention 
was made. Furthermore, the kind of predictability, a priori knowledge of functionality of the 
enzyme obtained using the methods of the invention, is not the kind of predictability envisioned 
by the Court in Wands. The instant rejection cannot properly be sustained against claim 1 . 

The nature of the invention and the breadth of the claims 

The claimed invention relates to isolated nucleic acids that encode an enzyme having a 
defined biological activity. As to claim 1, the invention as most broadly stated {e.g. (g) or (h)) 
lies in a nucleic acid that is defined by (1) inclusion of at least certain sequence features (that is, 
the PCR primer sequences that are used to generate the claimed nucleic acid), (2) hybridization 
to a certain reference sequence and (3) encoding a protein having a defined enzymatic activity. 
Claim 1 includes narrower definitions (a) through (f), related to particular amino acid 
sequences. Among descriptions (a) through (f), the nucleic acid as described by amino acid 
sequence SEQ ID NO: 5 ((c) and (d)) is proven to encode a protein having RFS activity, the 
nucleic acid described by reference to SEQ ID NO: 3 ((a) and (b)) represents the entire coding 
sequence of a RFS protein and the nucleic acid described by reference to SEQ ID NO: 7 ((e) 
and (f)) represents about 70% of the length of the coding sequence of a RFS protein. 

Inoperative embodiments are excluded from the claims by the requirement that the 
encoded protein have RFS activity. 

The art of molecular biology, in particular the art of expression of recombinant proteins, 

is one in which the artisan of ordinary skill expects to perform a few weeks or months of 

experimentation in generating variants of a protein, then isolating clones encoding those 

21 DRN/mua 



Application No. :09/30 1,766 



Docket No.: 0020-4559P 



variants and then (perhaps) re-cloning the isolated variants into vectors for expressing a protein, 
and then screening expressed proteins for activity. 

The level of ordinary skill in the art 

The artisan of ordinary skill in the art of cloning and expressing recombinant proteins is 
generally accepted as one having a Ph.D. degree and perhaps higher i.e., having significant post- 
doctoral laboratory experience. Such a person is skilled in the design and performing of 
experiments for isolating DNA clones and for screening them for a desired property, for 
example encoding a protein having a particular activity. 

The amount of experimentation needed 

The amount of experimentation needed to practice the present invention is not unduly 
large or burdensome. The practitioner must isolate a template genomic DNA or RNA from an 
organism, perform a polymerase chain reaction using primers described in the specification to 
generate an amplified fragment, clone that fragment into an expression vector, express the 
encoded protein and then screen the protein for activity as a raffinose synthase. All of these 
steps are either well-known in the art or described in detail in the specification {e.g. pp. 31-33 
(bacterial expression of the cloned cDNA and assay for RFS activity and Examples 1-6 
beginning at p. 38) and furthermore are expected to be performed by the artisan of ordinary 
skill. 

The state of the art at the time the invention was made 

At the time the invention was made, the state of the art of molecular biology was such 
that the various laboratory operations that must be performed to carry out the experimentation 
required to practice the instant invention, /. e. cloning of DNA molecules and expressing them in 
a host cell, were routine. Also, polymerase chain reaction amplification of nucleic acids was 
routine. 
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The raffmose content of a number of organisms, especially including plants and some 
algae, was known. The biochemistry of raffmose synthesis in plants had been established, and 
the role of raffmose synthases as rate-limiting of raffmose production was known. (See, e.g. pp. 
1-2 of the specification.) 

A biochemical assay for raffmose synthase activity was described. See Lehle et al, Eur. 
J. Biochem. 38:103 (1973) (attached). 

The guidance provided by the specification including the presence or absence of working 
examples 

The specification provides ample guidance to the skilled artisan for practicing the 
invention broadly. In particular, the specification discloses in detail how to clone DNAs 
encoding putative raffmose synthase enzymes. The specification provides details such as 
organisms likely to be usefiil for isolating template genomic DNA or RNA from plants 
commensurate in scope with claim 1 and corresponding PCR primers (Chenopdiceae (for beet), 
p. 11, line 14; Cruciferae (for mustard and rapeseed), p. 13, line 18 and associated PCR primers 
in Lists 2 and 3).^ The specification describes methods for cloning DNA encoding a putative 
raffmose synthase enzyme from an RNA fraction, including an extensive list of primers that can 
be utilized for PCR amplification from templates obtained from different organisms {see, e.g. 
Lists 6 and 7 at p. 43; Lists 8 and 9 at page 46; List 10 at p. 47). The specification describes 
methods for expressing the cloned DNA in plant cells and in bacteria {see, e.g. pages 29 to 37). 
The specification describes a biochemical assay for raffmose synthase, referring to the Lehle 
article noted above and summarizing the procedure beginning at the bottom of page 3 1 . 

The specification also provides a number of working examples of isolation of partial or 
complete raffmose synthase genes from a number of different plants {see. Examples 1-7) and of 
creation of an expression vector for use in plants (Example 8) transformation of a plant 
(mustard) with a cloned DNA encoding a raffmose synthase (Example 9). 



^ The specification also includes information useful for obtaining RFS cDNA from soybean. 
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The predictability in the art 

The Examiner asserts that the art of recombinant DNA cloning and recombinant protein 
expression is unpredictable. The Examiner argues that a practitioner of the invention must 
engage in trial and error experimentation to identify cloned DNAs that encode functional 
raffmose synthase genes. 

The Examiner's argument is simply incorrect. First, the skilled artisan can follow 
detailed teachings in the specification of how to clone, express and evaluate DNAs that are 
likely to encode functional raffmose synthase enzymes. It is true that it is a bit unpredictable 
whether any individual clone made in an experiment will include a DNA encoding a functional 
enzyme, but it is not unpredictable whether the skilled artisan would succeed in identifying at 
least one functional DNA in an experiment as a whole. To the contrary, it is very likely that the 
skilled artisan would find a cloned DNA encoding a functional enzyme by following the 
teachings of the specification. Appellants note that the experimental approach described in the 
specification resulted in identification of four cDNAs described in this application and 
additional cDNAs as described in the copending '914 application. 

The Board might consider certain details from the Wands case. In Wands, an invention 
related to isolation of hybridomas that secreted a particular antibody was deemed broadly 
enabled despite that extensive screening of many cloned cell lines was necessary AND that the 
success rate of the screening was only 2.8%, including experiments that failed to generate any 
operable clones at all. The Wands panel expressly stated that experimentation, such as the 
cloning and screening experiments described in the present application, that is expected to be 
performed by the artisan of ordinary skill, is not undue experimentation . 

Applicants submit that a proper weighing of the Wands factors will lead the Board to a 
proper conclusion that no undue experimentation is required to practice the present invention as 
claimed in claim 1. Accordingly, the Examiner's decision rejecting claim 1 for lack of 
enablement should be reversed because the Examiner failed to establish a prima facie lack of 
enablement. 
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Furthermore, Appellants have provided evidence, in the form of the Watanabe 
Declaration attached to their Amendment of February 1 1 , 2004, to support an assertion that the 
procedures described in the specification result in cloning of cDNAs encoding RFS enzymes. 
Appellants have also provided evidence that one of ordinary skill in the art can readily 
distinguish a RFS from a STS or another class of closely related proteins, Seed Imbibation 
Proteins (SIPs). The data in Figure 1 attached to Appellants' Amendment of February 11, 2004, 
and submitted as part of the Nagasawa Declaration (copied from the '914 appplication file and 
submitted with Appellants' Amendment of June 2, 2006) demonstrates unequivocally that the 
RFS subfamily of glycoside hydrolases (see Appellants' discussion of Peterbauer et al., belov/) is 
easily distinguished from the STS or SIP subfamilies of glycoside hydrolases on the basis that 
RFSs are more similar to each other, and STSs are more similar to each other, than RFSs are 
similar to STSs. This relationship among their amino acid sequences can be used to construct a 
"molecular phylogenetic tree" upon a branch of which any particular amino acid sequence 
thought to represent a RFS or STS (or SIP) can be placed. The Nagasawa Declaration further 
explains that this analysis is robust in its conclusions (though perhaps the specific degrees of 
sequence similarity may vary) to three different approaches to sequence similarity analysis. 

The Examiner has attempted to support his position regarding unpredictability in the art 
with evidence from the scientific literature. The Examiner has cited Richmond et al. Plant 
Physiology (2000) and Duggleby, Gene (1997) for a general assertion that, "The art teaches that 
one of skill in the art cannot assume the function of the polypeptide encoded by an isolated 
nucleic acid solely based on sequence similarity to a known polypeptide sequence. The 
Examiner cites Peterbauer et al., Planta (2002) for the proposition that RFS enzymes have high 
sequence homology to STSs and SIPs. The Examiner cites Peterbauer et al, Planta (1999) for 
the proposition that their group was the first to isolate a nucleic acid encoding a STS protein. 
See, the Office Action of August 11, 2003 at p. 3. The Examiner cites Bowie et al.. Science 
(1990) for the proposition that it is the three dimensional structure of an enzyme that confers its 
activity and that folding of a protein can be sensitive to minor changes in sequence. The 
Examiner cites Lazar Mol. Cell. Biol. (1998) as an example of an instance in which a certain 
change in sequence to TGF-a had no effect on the protein, but another instance of amino acid 
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substitution "sharply reduced biological activity". The Examiner cites Broun et al., Science 
(1998) for the proposition that a few amino acid substitutions can have radical effects on the 
activity of an enzyme. See, the Office Action of November 20, 2002, at pp. 6-7. 

Appellants do not dispute the general conclusion reached from the Bowie, Lazar and 
Broun papers that it is the three-dimensional structure of a protein that confers its biological 
activity, or that sometimes there are particular amino acids that must be conserved in the linear 
sequence to preserve the correct folding of the protein, or even that in some instances two 
distinct enzymes will share extensive portions of amino acid sequences. These concepts are 
well-known to the molecular biologist of ordinary skill in the art and they do suggest that it is 
somewhat unpredictable whether mutating a protein will result in maintaining, lessening or 
improving its biological activity. However, this is not determinative of whether undue 
experimentation is required to practice the instant invention. All that such unpredictability 
establishes is that, without actual assay data, one cannot say beyond reasonable doubt that a 
mutated protein will retain its original activity. However, this is not the proper standard of 
evidence to consider during patent prosecution. Appellants' burden is to only establish that it is 
more likely than not that the proteins of amino acid sequences 3 and 7 represent a protein having 
RFS activity, or that a cDNA obtained as described in parts (g) and (h) of claim 1 encodes such a 
protein. 

The Examiner asserts that Richmond et al. indicates that more than sequence similarity is 
needed as evidence of function, pointing out the paragraph bridging the left and right columns of 
page 497. Appellants see here only a description of domains present in members of the cellulose 
synthase family of proteins. Indeed, Richmond might be interpreted as more supportive of 
Appellants' position that sequence similarity is a useful tool for grouping proteins by activity. 
The Board might take note of Figure 1 of the paper, showing assignment of members of the 
family to subfamilies CesA, CesB, CesD, etc. based upon a molecular phylogeny. The Board 
may usefully compare Figure 1 of Richmond with Figure 1 attached to the Nagasawa 
Declaration, which shows a similar molecular phylogeny among RFSs, STSs and a SIP, with the 
result of clear separation of the three groups of enzymes. 
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The Examiner points out the last paragraph of Duggleby. There, the author states, 
"Ultimately the function of any DNA sequence, whose identity is based solely on homology, can 
only be proven by experiments designed to evaluate that function." Again, this simply goes to 
the standard of the proof For purposes of alleging utility in a patent application, the standard of 
proof is merely the preponderance of the evidence. Appellants note that Duggleby has no 
problem asserting function from sequence similarity. The Board might consider the text of the 
Note Added In Proof: "Recent examination of GenBank expressed sequence tags has identified 
three sequences . . . that may represent higher plant ALS small subunits. The last of these gives a 
very good match to the P. purpurea sequence; over residues 83-154 there are 46 identical, and 10 
similar, amino acids." The Board might further note that the author's conclusion is based upon a 
degree of identity of only 71% at the amino acid level of a partial amino acid sequence. 

The Duggleby paper describes study of the small subunit of the acetolactate synthase 
(ALS) from a bacterium, yeast and an alga. The paper provides an alignment of the genes from 
these three organisms (Figure 2). The authors note that there is only "limited similarity" among 
the three sequences, but nonetheless were able to detect a number of known bacterial ALS genes 
and also discovered the eukaryotic versions of the gene using a BLAST search of GENBANK 
and the bacterial sequence {B. flavum) as a query. See, p. 247, under Results and Discussion. 
Thus, Duggleby in fact also supports Appellants' assertion that comparison of sequence data is a 
common technique in the art for predicting biochemical function of a protein. ("These results 
clearly indicate that .S'. cerevisiae and P. purpurea contain a gene that could encode an ALS 
small subunit." (at the top of the right column on p. 247.)) 

Peterbauer (2002) describes isolation of a raffmose synthase gene from P. sativum (pea). 
The Examiner asserts that Peterbauer teaches that RFSs, STSs and SIPs demonstrate high overall 
sequence homology. This has not been disputed by Appellants. Peterbauer discusses this result 
in terms of assignment of all three of these enzyme types to the glycoside hydrolase enzyme 
family (p. 841, right column, above Figure 1). Appellants' argument is that RFSs are more alike, 
and STSs are more alike, than RFSs resemble STSs and therefore these members of the 
glycoside hydrolase family are distinguishable subfamilies. 
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Appellants note that the Examiner has read Peterbauer (2002) rather selectively. At the 
top of the right column on p. 841, Peterbauer easily distinguishes a STS transcript from a RFS 
transcript on the basis of sequence identity. 

In fact, Peterbauer uses an approach to cloning the pea RFS gene that is similar to that 
described in the present specification. That is, PGR primers designed from the amino acid 
sequence of the RFS were used to amplify template DNA from the pea plant. Then the resulting 
cDNA was expressed in a cell and the protein so produced was assayed for RFS activity. These 
teachings may usefully be compared with the working examples 1 -6 of the present specification 
and the Watanabe Declaration. 

Peterbauer (2002) does not particularly support the Examiner's position. The authors 
note that, "to distinguish between raffinose synthase and stachyose synthase, the primers were 
chosen to encompass a block of about 80 amino acids, which is exclusively present in stachyoses 
synthases." (Top of page 841, right column.) This establishes that there are in fact amino acid 
sequence elements that serve to distinguish a RFS from a STS. Second, the Examiner has read 
the paper very selectively, urging the data showing sequence similarity, but ignoring for 
example, the text at the top of the right column of p. 841, "To isolate a cDNA encoding for 
raffinose synthase by reverse transcription-PCR, degenerate primers were designed based upon 
amino acid motifs conserved among Cucumis sativa raffinose synthase, stachyose synthase and 
related sequences." Thus, Peterbauer et al. were satisfied that they could reliably distinguish 
among such sequences either by biochemical or sequence analysis methods. 

Thus, none of the papers proffered by the Examiner in rebuttal of Appellants' arguments 
is effective to undermine either their argument that the specification is enabling of practice of the 
invention, or the evidence of the Nagasawa Declaration that one of ordinary skill in the art can 
readily determine by amino acid sequence analysis whether a given amino acid sequence 
represents a RFS, a STS or a SIP. 

Since the Examiner has in the first instance failed to establish a prima facie lack of 

enablement of the claimed invention, and in the second instance has failed to effectively rebut 
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Appellants' arguments and evidence offered in support of enablement of the claimed invention, 
the present rejection of claim 1 under 35 U.S.C. § 112, first paragraph, for alleged lack of 
enablement, must be reversed . 

VIIB.2.-claim4 

Claim 4 is directed to isolated nucleic acids encoding the amino acid sequence of SEQ ID 
NO: 3. All of Appellants' arguments against the Examiner's rejection of claim 1 for lack of 
enablement are applicable as well to claim 4. However, the breadth of this claim is substantially 
narrower than the breadth of claim 1. Also, the amino acid sequence of SEQ ID NO: 3 is of the 
complete length of the protein and the degree of sequence identity to SEQ ID NO: 5, proven to 
represent an enzyme having RFS activity in the Watanabe Declaration, is 63%, substantially 
higher than the degree of identity between a RFS and STS {see, Table 2 attached to Appellants' 
Amendment of February 11, 2004). Therefore, the degree of unpredictability as to whether SEQ 
ID NO: 3 encodes a RFS enzyme or not may be considered to be lower than that for claim 1 as a 
whole, and so enablement of claim 4 should be weighed separately from enablement of claim 1 . 

For all of the reasons above. Appellants urge that the Examiner's decision that the 
specification fails to enable claim 4 should be reversed . 

VIIB.3.-claim5 

Claim 5 is directed to an isolated nucleic acid comprising a recited portion of sequence of 
SEQ ID NO: 4. All of Appellants' arguments against the Examiner's rejection of claim 1 for 
lack of enablement are applicable as well to claim 5. However, the breadth of this claim is 
substantially narrower than the breadth of claim 1. Also, the recited portion of SEQ ID NO: 4 
encodes the complete length of SEQ ID NO: 3, a full-length RFS having a degree of sequence 
identity to SEQ ID NO: 5, proven to represent an enzyme having RFS activity in the Watanabe 
Declaration, of 63%, substantially higher than the degree of identity between a RFS and STS 
(see, Table 2 attached to Appellants' Amendment of February 11, 2004). Therefore, the degree 
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of unpredictability as to whether SEQ ID NO: 4 encodes a RFS enzyme or not may be 
considered to be lower than that for claim 1 as a whole. 

Furthermore, the only experimentation necessary to determine conclusively whether the 
sequence SEQ ID NO: 4 in fact does encode a RFS enzyme is to clone this sequence into an 
expression vector, transform a bacterial or plant host cell with the vector and test the transformed 
bacteria or plant tissue for expression of RFS activity in the manner described in the 
specification. {See, e.g. pp. 31-37 of the specification.) Such experimentation must be 
considered well-guided by the specification and expected by the artisan of ordinary skill, and so 
not "undue". Accordingly, enablement of claim 5 should be weighed separately from 
enablement of claim 1 . 

For all of the reasons above. Appellants urge that the Examiner's decision that the 
specification fails to enable claim 5 should be reversed . 

Also, the specification, at page 26, line 13 to page 28, line 21, describes use of nucleic 
acids of the invention in genotyping analysis or for detection of mutation in raffinose synthase 
genes or for marking cloned plant varieties. These utilities are independent of whether or not the 
cloned DNA encodes a protein having raffinose synthase activity, for example, a nucleic acid 
encoding only a part of a raffinose synthase gene is adequate for use in such methods. At least 
for genotyping and plant variety identification even nucleic acids unrelated to raffinose synthase 
genes are useful. Therefore, the Board should consider that the specification provides adequate 
description of how to use the nucleic acid of claim 5 and the Examiner's decision to the contrary 
may by reversed for this reason alone. 

VII.B.5 - Claim 8 

Claim 8 is directed to isolated nucleic acids encoding the amino acid sequence of SEQ ID 
NO: 7. All of Appellants' arguments against the Examiner's rejection of claim 1 for lack of 
enablement are applicable as well to claim 8. However, the breadth of claim 8 is substantially 
narrower than the breadth of claim 1. Therefore, the degree of unpredictability as to whether 
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SEQ ID NO: 8 encodes a RFS enzyme or not may be considered to be lower than that for claim 1 
as a whole, and so enablement of claim 8 should be weighed separately from enablement of 
claim 1. 

VII.B.6-Claim9 

Claim 9 is directed to an isolated nucleic acid comprising a recited portion of sequence of 
SEQ ID NO: 8. All of Appellants' arguments against the Examiner's rejection of claim 1 for 
lack of enablement are applicable as well to claim 9. However, the breadth of this claim is 
substantially narrower than the breadth of claim 1 . Therefore, the degree of unpredictability as 
to whether SEQ ID NO: 8 encodes a RFS enzyme or not may be considered to be lower than that 
for claim 1 as a whole, and so enablement of claim 9 should be weighed separately from 
enablement of claim 1 . 

Furthermore, the only experimentation necessary to determine conclusively whether the 
sequence SEQ ID NO: 8 in fact does encode a RFS enzyme is to clone this sequence into an 
expression vector, transform a bacterial or plant host cell with the vector and test the transformed 
bacteria or plant tissue for expression of RFS activity in the manner described in the 
specification. (See, e.g. pp. 31-37 of the specification.) Such experimentation must be 
considered well-guided by the specification and expected by the artisan of ordinary skill and so 
not "undue". Accordingly, the Board should weigh enablement of claim 9 separately from 
enablement of claim 1 . 

For all of the reasons above. Appellants urge that the Examiner's decision that the 
specification fails to enable claim 9 should be reversed . 

Also, the specification, at page 26, line 13 to page 28, line 21, describes use of nucleic 
acids of the invention in genotyping analysis or for detection of mutation in raffinose synthase 
genes or for marking cloned plant varieties. These utilities are independent of whether or not the 
cloned DNA encodes a protein having raffinose synthase activity, for example, a nucleic acid 
encoding only a part of a raffinose synthase gene is adequate for use in such methods. At least 
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for genotyping and plant variety identification even nucleic acids unrelated to raffinose synthase 
genes are useful. Therefore, the Board should consider that the specification provides adequate 
description of how to use the nucleic acid of claim 9 and the Examiner's decision to the contrary 
may by reversed for this reason alone. 

VII.B.7- Claim 10 

Claim 10 is directed to an isolated nucleic acid comprising the entirety of any one of SEQ 
ID Nos: 4, 6 or 8. All of Appellants' arguments against the Examiner's rejection of claim 1 for 
lack of enablement are applicable as well to claim 10. However, the breadth of this claim is 
substantially narrower than the breadth of claim 1 . Therefore, the degree of unpredictability as 
to whether SEQ ID Nos: 4 and 8 encode a RFS enzyme or not may be considered to be lower 
than that for claim 1 as a whole, and so enablement of claim 10 should be weighed separately 
from enablement of claim 1 . 

Furthermore, the only experimentation necessary to determine conclusively whether the 
sequences SEQ ID NO: 4 and 8 in fact do encode a RFS enzyme is to clone these sequences into 
an expression vector, transform a bacterial or plant host cell with the vector and test the 
transformed bacteria or plant tissue for expression of RFS activity in the manner described in the 
specification. (See, e.g. pp. 31-37 of the specification.) Such experimentation must be 
considered well-guided by the specification and expected by one of ordinary skill in the art and 
so not "undue". Accordingly, the Board should weigh enablement of claim 10 separately from 
claim 1. 

For all of the reasons above. Appellants urge that the Examiner's decision that the 
specification fails to enable claim 10 should be reversed . 

Also, the specification, at page 26, line 13 to page 28, line 21, describes use of nucleic 
acids of the invention in genotyping analysis or for detection of mutation in raffinose synthase 
genes or for marking cloned plant varieties. These utilities are independent of whether or not the 
cloned DNA encodes a protein having raffinose synthase activity, for example, a nucleic acid 
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encoding only a part of a raffmose synthase gene is adequate for use in such methods. At least 
for genotyping and plant variety identification even nucleic acids unrelated to raffmose synthase 
genes are useful. Therefore, the Board should consider that the specification provides adequate 
description of how to use the nucleic acid of claim 10 and the Examiner's decision to the 
contrary may by reversed for this reason alone. 

VII.C.8 - Claims 16-23. 28 and 29 

Claims 16-23, 28 and 29 are dependent ultimately from claim 1 and stand rejected for the 
same reasons as claim 1 is rejected. These dependent claims are directed to embodiments of the 
invention in which a nucleic acid providing a sequence encoding a RFS enzyme is operatively 
linked to a promoter (claim 16) or placed into a vector (claim 17) or to a transformed host cell 
comprising the nucleic acid of claim 1 either per se, or as part of a promoter-structural gene 
construct or as part of a vector (claims 18, 19 and 20, respectively). Claims 22, 23, 28 and 29 
further define the nature of the host cell or the nature of the promoter, respectively. 

The Examiner has so far presented no reason for rejection of claims 16-23, 28 and 29 
independent from the rejection of claim 1. Thus, the Board is respectfully requested to consider 
that, should the decision of the Examiner with respect to any part (a) through (h) of claim 1 be 
reversed, the dependent claims 16-23, 28 and 29 should be indicated as allowable if rewritten to 
recite the allowable part of claim 1 . 

VII. C. - Summary and Conclusion 

Claims 1, 4, 5, 8-10, 16-23, 28 and 29 stand rejected under 35 U.S.C. § 112, first 
paragraph, for alleged lack of adequate written description of the invention. The Examiner's 
argument on this issue is that the specification fails to describe any particular amino acid 
sequence that defines a protein as having raffmose synthase activity, and therefore the generic 
invention is not described. 
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Appellants submit that this argument is not persuasive. In the first instance, the 
specification asserts that the defined sequences in SEQ ID Nos: 1-8 (of which SEQ ID Nos:3-8 
are recited in claims) define nucleic acids according to the invention, either at the nucleic acid or 
at the amino acid level. Appellants submit that specific description of a structure constitutes 
substantial evidence that they "possess" the invention so described and have placed such an 
invention in the hands of the public. Vas-Cath v. Mahurkar, 19 USPQ2d 1111 (Fed. Cir. 1991). 

Furthermore, the specification describes a number of PGR primers, derived from the data 
of SEQ ID Nos: 2, 4, 6 and 8 or otherwise, that are useful when applied to template nucleic acids 
from plant types associated with the primer sequences as described in the specification, to obtain 
further cloned cDNAs encoding raffinose synthase enzymes. The specification also describes 
how to test any nucleic acids obtained by such a technique for activity of a raffinose synthase. 
Therefore, the invention is at the very least well-described in "product-by-process" terms. Fiers 
V. Revel, 25 USPQ2d at 1605. One may also consider that the PGR primers represent minimal 
nucleotide sequences that must be present to define a nucleic acid as one encoding a raffinose 
synthase. Also, the specification, at pages 20-21, describes particular regions of amino acid 
sequence that should have high homology to SEQ ID NO: 3, which is an amino acid sequence 
shown by Declaration evidence to represent a protein having RFS activity. Therefore, to this 
degree at least, a "structure-function" relationship is described in the specification. 

Thus, Appellants submit that the specification meets the legal standard for adequate 
written description of the claimed invention, i.e. it evidences that the inventors were in 
possession of the invention as claimed. Accordingly, the rejection of claims 1, 4, 5, 8-10, 16-23, 
28 and 29 under 35 U.S.G. § 1 12, first paragraph, for alleged lack of written description support, 
should be reversed. 

Glaims 1, 4, 5, 8-10, 16-23, 28 and 29 are also are rejected under 35 U.S.G. § 112, first 
paragraph, for alleged lack of enablement. The Examiner's position is essentially that, since one 
of ordinary skill in the art is unable to distinguish a nucleic acid encoding a raffinose synthase 
enzyme from a nucleic acid encoding a stachyose synthase enzyme based only on a degree of 
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sequence identity, the specification fails to teach the skilled artisan how to use the present 
invention. 

This rejection fails in the first instance because the Examiner fails to establish any prima 
facie lack of enablement. Proper consideration of the question of enablement requires 
establishing that undue experimentation is required to practice the full scope of the invention. 
This question is addressed by considering a number of factors. In re Wands, 8 USPQ2d at 1400. 

However, the Examiner's explanation of the rejection addresses only the question of 
whether one of ordinary skill in the art, having a particular nucleic acid in hand, can predict , 
based upon its sequence, whether or not that nucleic acid encodes a raffinose synthase enzyme, 
or whether instead it encodes a stachyose synthase. Such analysis ignores the other factors to be 
considered. 

On the other hand, Appellants explain that the specification is enabling of the claimed 
invention, addressing the remaining considerations required under Wands. Appellants also 
present evidence to support an allegation that the skilled artisan, using the teachings of the 
specification in a manner accepted in the art at the time the invention was made {e.g. molecular 
phylogeny based upon degree of amino acid sequence similarity) can easily distinguish a 
raffinose synthase enzyme from a stachyose synthase enzyme. Appellants also point out that the 
specification provides express guidance of how to determine biochemically if a protein expressed 
from a cloned nucleic acid exhibits activity of a raffinose synthase. Furthermore, as to claims 5, 
9 and 10, directed to particular nucleic acids encoding raffinose synthase enzymes, the 
specification describes utilities for the cloned nucleic acids that are independent of whether they 
actually encode a functional enzyme. For these three claims, the Examiner's entire rationale for 
making the rejection fails. Therefore it is plainly established that the present specification is 
enabling of the claimed invention and so the rejection of claims 1, 4, 5, 8-10, 16-23, 28 and 29 
under 35 U.S.C. § 1 12, first paragraph, for alleged lack of enablement, must be reversed. 

The favorable action of reversal of all of the rejection of claims 1, 4, 5, 8-10, 16-23, 28 

and 29 under 35 U.S.C. § 1 12, first paragraph, for alleged lack of written description support and 
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for alleged lack of enablement, and remand to the Examiner for allowance of all of the pending 
claims, is respectfully requested. 
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VIIL CLAIMS 

A copy of the claims involved in the present appeal is attached hereto as Appendix A. 
The claims in Appendix A are as addressed by the Examiner in the Final Office Action of 
August 23, 2006. 
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IX. EVIDENCE 

A copy of evidence pursuant to §§ 1.130, 1.131, or 1.132 and/or evidence entered by or 
relied upon by the examiner that is relevant to this appeal is attached hereto as Appendix B. 

1. Tables 1 and 2 and Figure. 1, which were presented attached to Appellants' paper of 
February 11,2004. 

2. Watanabe Declaration, presented attached to Appellants' paper of February 11, 2004 

3. Exhibit 1, explanation of various sequence analysis programs, attached to Appellants' 
paper of November 15, 2004. 

4. Lehle and Tanner, Eur. J. Biochem. 38:103-1 10 (1973), cited at the bottom of page 31 
of Specification. 

5. Declaration of Akistu NAGASAWA, copied from the copending application 
09/301,714 and attached to Appellants' paper of June 2, 2006. 

6. Richmond et al.. Plant Physiol. 124:495-498 (2000), cited by the Examiner in the 
Office Action of August 11, 2003. 

7. Duggleby, Gene 190:245-249 (1997), cited by the Examiner in the Office Actions of 
February 6, 2002 and November 20, 2002. 

8. Bowie et al.. Science 247:1306-1310 (1990), cited by the Examiner in the Office 
Action of November 20, 2002. 

9. Lazar et al.. Molecular, Cellular Biology 8:1247-1252 (1988), cited by the Examiner 
in the Office Action of November 20, 2002. 

10. Broun et al.. Science 282:1315-1317 (1998), cited by the Examiner in the Office 
Action of November 20, 2002. 
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11. Peterbauer et al, Planta 215:839-846 (2002), cited by the Examiner in the Office 
Action of August 11, 2003 and December 2, 2005 . 

12. Exhibit 4, alignment of SEQ ID NO: 5 of the instant application with SEQ ID NO: 7 
of the instant application. 
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X. RELATED PROCEEDINGS 

There are no prior decisions of any Court or of the Board of Appeals and Interferences in 
this matter. 

Dated: July 23, 2007 Respectfully submitted. 



By 

Mark J. Nuell 
Registration No.: 36,623 

BIRCH, STEWART, KOLASCH & BIRCH, LLP 
8110 Gatehouse Road 
Suite 100 East 
P.O. Box 747 

Falls Church, Virginia 22040-0747 
(703) 205-8000 
Attorney for Applicant 
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APPENDIX A 

Claims Involved in the Appeal of Application Serial No. 09/301,766 

The pending claims 1-10, 16-23, 28 and 29, are set forth below as amended on June 2, 2006. 
Claims 1, 4, 5, 8-10, 16-23, 28 and 29 are on appeal: 

1 . An isolated nucleic acid which comprises a polynucleotide encoding a protein that 
binds a D-galactosyl group through the a(l-^6) bond to the hydroxyl group attached to the 
carbon atom at 6-position of the D-glucose residue in a sucrose molecule to form raffmose, 
wherein said polynucleotide comprises a nucleotide sequence selected from the group consisting 
of: 

(a) a nucleotide sequence encoding the amino acid sequence as depicted in SEQ ID 
NO: 3, 

(b) a nucleotide sequence depicted by the 236* to 2584* nucleotides in the nucleotide 
sequence as depicted in SEQ ID NO: 4, 

(c) a nucleotide sequence encoding the amino acid sequence as depicted in SEQ ID 
NO: 5, 

(d) a nucleotide sequence depicted by the 1 34* to 2467* nucleotides in the nucleotide 
sequence as depicted in SEQ ID NO: 6, 

(e) a nucleotide sequence encoding the amino acid sequence as depicted in SEQ ID 
NO: 7, 

(f) a nucleotide sequence depicted by the 1^' to 1719* nucleotides in the nucleotide 
sequence as depicted in SEQ ID NO: 8, 

(g) a nucleotide sequence obtained from a polynucleotide which is amplified from a 
nucleic acid obtained from beet with a combination of a PGR primer selected 
from the group consisting of SEQ ID NO: 1 1 and SEQ ID NO: 13 and a PCR 
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primer selected from the group consisting of SEQ ID NO: 12 and SEQ ID NO: 
1 4, wherein said nucleotide sequence hybridizes with a nucleotide sequence 
complementary to the nucleotide sequence of (a) or (b), in a buffer comprising 
0.9M NaCl and 0.09M citric acid at 65°C to 68°C, and 

(h) a nucleotide sequence obtained from a polynucleotide which is amplified from a 
nucleic acid obtained from mustard or rapeseed with a combination of a PGR 
primer selected from the group consisting of SEQ ID NO: 15, SEQ ID NO: 17 
and SEQ ID NO: 19 and a PGR primer selected from the group consisting of SEQ 
ID NO: 16, SEQ ID NO: 18 and SEQ ID NO: 20, wherein said nucleotide 
sequence hybridizes with a nucleotide sequence complementary to the nucleotide 
sequence of any one of (c) to (f), in a buffer comprising 0.9M NaGl and 0.09M 
citric acid at 65°G to 68°G. 

4. An isolated nucleic acid comprising a nucleotide sequence encoding the amino acid 
sequence as depicted in SEQ ID NO: 3. 

5. (Allowed) An isolated nucleic acid comprising the nucleotide sequence depicted by the 
236th to 2584th nucleotides in the nucleotide sequence as depicted in SEQ ID NO: 4. 

6. (Allowed) An isolated nucleic acid comprising a nucleotide sequence encoding the 
amino acid sequence as depicted in SEQ ID NO: 5. 

7. An isolated nucleic acid comprising the nucleotide sequence depicted by the 134th to 
2467th nucleotides in the nucleotide sequence as depicted in SEQ ID NO: 6. 

8. An isolated nucleic acid comprising a nucleotide sequence encoding the amino acid 
sequence as depicted in SEQ ID NO: 7. 



9. An isolated nucleic acid comprising the nucleotide sequence depicted by the 1st to 
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1719th nucleotides in the nucleotide sequence as depicted in SEQ ID NO: 8. 

1 0. An isolated nucleic acid comprising the nucleotide sequence as depicted in SEQ ID 
NO: 4, SEQ ID NO: 6, or SEQ ID NO: 8. 

16. An isolated nucleic acid comprising the nucleic acid of claim 1, which is operatively 
linked to a promoter. 

1 7. A vector comprising the nucleic acid of claim 1 . 

18. A transformant, wherein the nucleic acid of claim 1 is introduced into a host cell. 

19. A transformant, wherein the nucleic acid of claim 16 is introduced into a host cell. 

20. A transformant, wherein the vector of claim 17 is introduced into a host cell. 

21 . The transformant of claim 1 8, wherein the host cell is a microorganism. 

22. The transformant of claim 18, wherein the host cell is a plant cell. 

23. A method for producing a raffmose synthase which comprises the steps of: 
culturing or growing the transformant of claim 1 8 to produce the raffmose synthase, and 
collecting the raffmose synthase. 

28. The nucleic acid of claim 16, wherein said promoter is effective in a plant cell. 

29. The nucleic acid of claim 16, wherein said promoter is effective in a yeast cell. 
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APPENDIX B 

The following items are of record as evidence in the present application and are attached 
hereto in support of Appellants' Appeal Brief: 

1. Tables 1 and 2 and Figure. 1, which were presented attached to Appellants' paper of 
February 11,2004. 

2. Watanabe Declaration, presented attached to Appellants' paper of February 11, 2004 

3. Exhibit 1, explanation of various sequence analysis programs, attached to Appellants' 
paper of November 15, 2004. 

4. Lehle and Tanner, Eur. J. Biochem. 38:103-1 10 (1973), cited at the bottom of page 31 
of Specification. 

5. Declaration of Akistu NAGASAWA, copied from the copending application 
09/301,714 and attached to Appellants' paper of June 2, 2006. 

6. Richmond et al.. Plant Physiol. 124:495-498 (2000), cited by the Examiner in the 
Office Action of August 11, 2003. 

7. Duggleby, Gene 190:245-249 (1997), cited by the Examiner in the Office Actions of 
February 6, 2002 and November 20, 2002. 

8. Bowie et al.. Science 247:1306-1310 (1990), cited by the Examiner in the Office 
Action of November 20, 2002. 

9. Lazar et al., Molecular, Cellular Biology 8:1247-1252 (1988), cited by the Examiner 
in the Office Action of November 20, 2002. 
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10. Broun et al., Science 282:1315-1317 (1998), cited by the Examiner in the Office 
Action of November 20, 2002. 

11. Peterbauer et al., Planta 215:839-846 (2002), cited by the Examiner in the Office 
Action of August 11, 2003 and December 2, 2005. 

12. Exhibit 4, alignment of SEQ ID NO: 5 of the instant application with SEQ ID NO: 7 
of the instant application, attached to the present Appeal Brief 
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IN THE U.S. PATENT AND TRADEMARK OFFICE 



Applicants: Eijiro WATANABE et al. 
Serial No.: 09/301,766 
Filed: April 29, 1999 



Examiner: D.H.Kruse 



Group: 1638 



For: 



RAFFINOSE SYNTHASE GENES AND THEIR USE 



DECLARATION UNDER 37 CFR 1.132 



Honorable Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

Sir: 

I, Eijiro WATANABE, citizen of Japan and residing in Fukui-cho 32-12-403, 
Takarazuka-shi, Hyogo-ken, Japan, declare and say that: 

1. I completed the doctor's course, with a major in agricultural chemistry, 
of the graduate school of Tokyo University and obtained a doctor's degree in agriculture 
at Tokyo University in March, 1991. 

2. From April, 1991, I made further researches in the Department of 
Agricultural Chemistry, Faculty of Agriculture, Tokyo University, as a postdoctoral 
fellow (Japan Society for the Promotion of Science) for one year. 

3. From April, 1992 to the present, I have been an employee of 
Sumitomo Chemical Company, Limited, the assignee of the above-identified 
application. 

4. From April, 1992 to March, 2000, I had been engaged in research 
works for plant engineering using recombination and other gene manipulation, such as 
cloning of plant genes, preparation and evaluation of transgenic plants. 

5. I am one of the inventors of the above-identified application and am 
familiar with the subject matter thereof. 

6. I have read the Office Action mailed August 11, 2003 and the 
reference cited, and am familiar with the subject matter thereof. 
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7. To demonstrate successful expression of raffinose synthase activity in 

transgenic plants, I have made the following experiments. 
Experiments 

Transformation of Tobacco with Raffinose Synthase Gene Derived from Brassica 

Plant 

The vector BjRS-Sac(+)-121 having the mustard raffinose synthase gene of the 
present invention in the expressible direction (i.e. sense direction) and the vector 
BjRS-Sac(-)-121 having the mustard raffinose synthase gene of the present invention in 
the reverse direction (i.e. antisense direction), which are the same as obtained in Example 
8 of the present specification, were used for the transformation of tobacco (Nicotiana 
tubacum) by the Agrobacterium infection method. 

Agrobacterium tumefaciens (strain LBA4404 having rifampicin and streptomycin 
resistance) previously converted into a competent state by calcium chloride treatment was 
transformed independently with two plasmids BjRS-Sac(+)-121 and BjRS-Sac(-)-121. 
The transformants were selected on LB medium containing 50 [ig/ml rifampicin and 
25 ^xg/ml kanamycin by utilizing the kanamycin resistant character conferred by the 
kanamycin resistant gene (neomycin phosphotransferase, NPTII) of the introduced 
plasmids. 

The transformant Agrobacterium obtained (Agrobacterium tumefaciens strain 
LBA4404, rifampicin and streptomycin resistant) was cultured on LB medium containing 
50 ^g/ml rifampicin and 25 |xg/ml kanamycin at 28°C for a whole day and night, and the 
culture was used for the transformation of tobacco by the method described below. 

Seeds of tobacco were aseptically sown on 1/2 MS medium containing 2% 
sucrose and 0.7% agar. After one week, leaves of sprouting plants were cut out with a 
scalpel, and transferred to MS medium containing 3% sucrose, 0.7% agar, l.Omg/1 BA 
and O.lmg/1 NAA, followed by preculture for 1 day. The precultured leaves were 
transferred in a 1000-fold dilution of the Agrobacterium culture broth and allowed to stand 
for 5 minutes. The leaves were transferred again to the same medium as used in the 
preculture, and cultured for 3 to 4 days. The cultured leaves were transferred to MS 
medium containing 3% sucrose, l.Omg/1 BA, O.lmg/1 NAA and 500 mg/1 cefotaxim, and 
shaken for 1 day to remove microbial cells. The leaves thus treated were transferred to 
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MS medium containing 3% sucrose, 0.7% agar, l.Omg/1 BA, O.lmg/1 NAA, 100 mg/1 
cefotaxim and 20 mg/1 kanamycin, and cultured for 3 to 4 weeks. The leaves were 
transferred to MS medium containing 3% sucrose, 0.7% agar, l.Omg/1 BA, O.lmg/1 NAA, 
100 mg/1 cefotaxim and 20 mg/1 kanamycin, and cultivated. The cultivation on this 
medium was continued with subculturing at intervals of 3 to 4 weeks. When shoots were 
began to regenerate, these shoots were subcultured on MS medium containing 3% sucrose, 
0.7% agar and 20 mg/1 kanamycin, and cultivated for 3 to 4 weeks. The rooting plants 
were transferred to vermiculite : peat moss = 1:1, and cultivated at 21°C to 22°C in a 
cycle of day/night = 12 hours : 12 hours. With the progress of plant body growth, the 
plants were grown with cultivation soil. 

Measurement of Raffinose Synthase Activity 

Leaves of the transformed tobacco plant were put in 10 times of the leaf weight 
of 100 mM Tris-HCl (pH 7.4), 5 mM DTT (dithiothreitol), 1 mM EDTA, 1 mM PMSF 
(phenylmethylsulfonyl fluoride) and 1 mM benzamide, and ground on ice with a mortar. 
The ground material was centrifuged at 21,400 x g for 50 minutes at 4°C. The 
resulting supernatant was recovered and used as a sample for the following 
measurement of raffinose synthase activity. 

The raffinose synthase activity was measured under the following conditions 
according to the description of L. Lehle and W. Tanner, Eur. J. Biochem., 38, 103-110 
(1973). 

First, 2 fxl of a sample to be used in the measurement of activity was added to 
18 \xl of the reaction mixture that came to contain 100 mM Tris-HCl (pH 7.4), 5 mM 
DTT (dithiothreitol), 0.01% BSA, 200 \xM sucrose, 5 mM galactinol, 31.7 \xM [^^C] 
sucrose, and the reaction mixture was kept at 37°C for 18.3 hours. After the reaction, 
30 [A of ethanol was added to the reaction mixture, followed by stirring and 
centrifugation at 15,000 rpm for 5 minutes. The supernatant was spotted at a volume 
of 5 fxl on an HPTLC plate of cellulose for thin layer chromatography (Merck, 10 cm x 
20 cm), and developed with n-butanol : pyridine ; water : acetic acid = 60 : 40 : 30 : 3. 
The developed plate was dried and then quantitatively analyzed with an imaging 
analyzer (Fuji Photographic Film, FUJIX Bio Imaging Analyzer BAS-2000II) for the 
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determination of content of ["'"'^C] raffinose. Raffinose syntliase activity in each sample 
was calculated from the content of ['^'^C] raffinose. 
Results 

Results are summarized in Fig. 1. The transformed tobacco plant with 
BjRS-Sac(+)-121 ("Sense" in the figure) showed significantly higher level of raffinose 
synthase activity in leaf than the wild type ("Wild" in the figure). 

Discussion 

As can be seen from Fig. 1, the transformed tobacco plant having the mustard 
raffinose synthase gene of the present invention in sense direction exhibited higher 
raffinose synthase activity as compared with the control tobacco plant having no such 
gene. This indicates that tabacco plants may have improved raffinose synthase activity 
by introduction of the raffinose synthase gene of the present invention in sense direction 
into these plants. 

Thus, it is clearly demonstrated that the raffinose synthase gene of the present 
uivention can successfully express raffinose synthase activity in the transformed tobacco 
plant. 
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8. 



I declare further that all statements made herein of my own knowledge 



are true and that all statements made on information and belief are believed to be true; 
and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonments, or both, under 
Section 1001 of Title 18 of the United States Code and that such willful false statements 
may jeopardize the validity of the above-identified application or any patent issued 
thereon. 



This day of February, 2004 




Eijiro WATANABE 



5 



IN THE U.S. PATENT AND TRADEMARK OFFICE 



Applicants: Eijiro WATANABE et al. 

Serial No.: 08/992,914 Group; 1638 

Filed: December 18, 1997 Examiner: D.H.Kruse 

For: RAFFINOSE SYNTHASE GENES AND THEIR USE 



DECLARATION UNDER 37 CFR 1.132 

Honorable Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

Sir: 

I, Akitsu NAGASAWA, citizen of Japan and residing in Kamokogahara 
3-28-56, Higashi-Nada-ku, Kobe-shi, Hyogo-ken, Japan, declare and say that: 

1. I completed the master's course, with a major in agricultural biology, 
of the graduate school of Kyoto University and obtained a master's degree in 
agriculture at Kyoto University in March, 1984. 

2. From April, 1984 to the present, I have been an employee of 
Sumitomo Chemical Company, Limited, the assignee of the above-identified 
application. 

3. From April, 1984 to the present, I have been engaged in research 
works for plant engineering using recombination and other gene manipulation, such as 
cloning of plant genes, preparation and evaluation of transgenic plants. 

4. I am one of the members of the research project related to the 
above-identified application and am familiar with the subject matter thereof. 

5. I have read the Office Action mailed March 11, 2005 and the 
reference cited, and am familiar with the subject matter thereof. 

6. To demonstrate successful identification of raffinose synthase genes 
in plant, I have made the following computer analysis. 
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ANALYSIS 

1) The overall sequence homologies (%) among the amino acid sequences of 
raffinose synthases (RFSs), seed imbibition protein (SIP) and stachyose synthases 
(STSs) shown in Table 1 attached hereto were calculated based on a global multiple 
alignment (the alignment of sequences over their entire length) using the gene analysis 
software GENETYX-SV/RC for Windows version 6.1.0 (GENETYX Corporation; 
http://www.sdc.co.jp/genetyx/) with default parameters. The global multiple 
alignment was generated using CLUSTAL sequence analysis program. The amino 
acid sequences of the RFSs, SIP and STSs used to produce the global multiple 
alignment are as follows: 



Sc-02: 

MAPPS I TKTATLQDV I ST I D I GNGNSPLFS I TLDQSRDFLANGHPFLTQV 
PPNITTTTTTTASSFLNLKSNKDTIPNNNNT.MLLQQGCFVGFNSTEPKSH 
HVVPLGKLKG 1 KFMS I FRFKVWWTTHWVGTNGQELQHETQML I LDKNDSL 
GRPYVLLLPILENTFRTSLQPGLNDHIGMSVESGSTHVTGSSFKACLYIH 
LSNDPYSILKEAVKVIQTQLGTFKTLEEKTAPSIIDKFGWCTWDAFYLKV 
HPKGVWEGVKSLTDGGCPPGFVIIDDGWQSICHDDDDEDDSGMNRTSAGE 
QMPCRLYKYEENSKFREYENPENGGKKGLGGFYRDLKEEFGSVESVYVWH 
ALCGYWGGVRPGVHGMPKARVVVPKVSQGLKMTMEDLAVDK I VENGYGLV 
PPDFAHEMFDGLHSHLES AG I DGVKVDV I HLLELLSEEYGGRVEL ARAYY 
KALTSSYKKHFKGNGVIAS.MEHCNDFFLLGTEAISLGRVGDDFWCSDPSG 
DPNGTYWLQGCHMVHCAYNSLWMGNFIQPDWDMFQSTHPCAEFHAASRAI 
SGGP I YVSDCYGNHNFKLLKSLVLPDGS ILRCQHYALPTRDCLFEDPLHN 
GKTMLKIWNLNKYTGVLGLFNCQGGGWCPEARRNKSYSEFSRAYTCYASP 
EDIEWCNGKTPMSTKGYDFFAVYFFKEKKLRLMKCSDRLKYSLEPFSFEL 
MTVSP YKYFSKRF 1 QFAP I GLYNMLNSGGA I QSLEFDDNASL VK 1 GYRGC 
GEMSVFASEKPYCCKIDGVKVKFLYEDKMARVQILWPSSSTLSLYQFLF 



Sc-03: 

MAPSFSKENSKTCDEVANHDDCNTCP 1 1 SLEESNFMVNGHVI LSQVPSNI 
TA I SKMGFDGLFYGFDAPEPKARHVYSVGQLKG 1 PFMS I FRFKVWWTTHW 
TGSNGRDLEHETQ I L I LDKSDEGLGRP Y I Y 1 LPL I EGPFRASLQPGS VDD 
YYD I CVESGSTKYYGDSFR.AYL Y I RAGPDPFKL I KDTMKEYQAHLGTFKL 
LDDKTPPG I YDKFGWCTWDAFYLKYEXYGYWEGYKGLVENGYPPGLYL I D 
DGWQS I CHDDDP I TDQEG 1 NRTSAGEQMP CRL I KYEENFKFRDYKSPN I M 
GHEDHPNMGMRAFYRDLKEEFKTYEHYYYWHAFTGYWGGYRP.NYPGLXEA 
QVYTPKLSPGLEMTMEDLAYDKIYNNGIGLYQPDKAQELYEGLHSHLENC 



G I DGVKYDVI HLLEMMAEDYGGRVELAKTYYKA I TES YRKHFKGNGV I AS 
MEQCNDFMLLGTET I CLGRVGDDFWPTDPSGD INGTYffLQGCHMYHCAYN 
SLWMGNF IHPDWDMFQSTHPCAEFHAASRA I SGGP I YVSDVVGKHN I PLL 
KRLVLADGSILRCEYHALPTKDCLFVDPLHDGKTMLKIWNLNKYNGVLGV 
FNCQGGGWSRESRKNLCFSEYSKPISCKTSPKDVEWENGHKPFPIKGYEC 
FAMYFTKEKKL I LSQLSDT I E I SLDPFDYEL I YVSPMT I LPWES I AFAP I 
GLVNMLNAGGAVKSLDISEDNEDKMVQVGIKGAGEMMYYSSEKPKACRVN 
GEDMEFEYEESMIKVQVTWNHNSGGFTTVEYLF 



Sc-04 (truncated) : 

MAPSISKTYELNSFGLVNGKLPLSITLEGSNFLANGHPFLTEYPENIIVT 
PSPIDAKSSKNNEDDDYYGCFVGFHADEPRSRHYASLGKLRGIKFMSIFR 
FKVWWTTHWYGSNGHELEHETQMMLLDKNDQLGRPFVL I LP I LQASFRAS 
LQPGLDDYYDYCMESGSTRVCGSSFGSCLYYHVGHDPYQLLREATKYVRM 
HLGTFKLLEEKTAPYIIDKFGWCTWDAFYLKYHPSGYWEGVKGLYEGGCP 
PGMYL IDDGWQA I CHDEDP I TDQEGMKRTSAGEQMPCRLYKLEENYKFRQ 
YCSGKDSEKGMGAFVRDLKEQFRSVEQVYYWHALCGYWGGVRPKVPGMPQ 
AKYVTPKLSNGLKLTMKDLAYDKIVSNGVGLVPPHLAHLLYEGLHSRLES 
AG I DGYKYDY IHLLEMLSEEYGGRYELAKAYYKALTASVKKHFKGNGY I A 
SMEHCNDFFLLGTEAIALGRVGDDFWCTDPSGDPNGTYWLQGCHMVHCAY 
NSLWMGNFIQPDWDMFQSTHPCAEFHAPLGPSLVDQFTLYIVLESTTSSC 
SRASLCLMGRFCYVKTMHSPHETVCLKTPCMMGRQCSKFGISTNIQYFWY 
YLIAKEYGGYP 



Sc-05: 

MAPPSVIKSDAAYNGIDLSGKPLFRLEGSDLLANGHYVLTDVPYN'VTVTA 
SPYLADKDGEPVDASAGSF I GFNLDGEPRSRHYAS I GKLRD I RFMS I FRF 
KVffWTTHWVGSKGSDIENETQI I ILENSGSGRPYYLLLPLLEGSFRSSFQ 
PGEDDDYAYCVESGSTQVTGSEFRQVYYYHAGDDPFKLYKDAMKYYRVHM 
NTFKLLEEKXPPGIYDKFGWCTWDAFYLTVNPDGYHKGYKCLVDGGCPPG 
LVLIDDGWQSIGHDSDGIDYEGMSCTVAGEQMPCRLLKFQENFKFRDYYS 
PKDKNEYGMKAFYRDLKEEFSTVDY I YYWHALCGY.ffGGLRPGAPTLPPST 
I VRPELSPGLKLTMQDLAYDK I VDTG I GFYSPDMANEFYEGLHSHLQNYG 
IDGYKYDYIHILEMLCEKYGGRYDLAKAYFKALTSSYNKHFDGNGYIASM 
EHCNDFMFLGTEA I SLGRYGDDFWCTDPSGD I NGTYWLQGCHMVHCAYNS 
LWMGNF ] QPDWDMFQSTHPCAEFHAASRAI SGGP I Y I SDCYGQHDFDLLK 
RL YLPDGS I LRCEHYALPTRDRLFEDPLHDGKTMLK I WNLNKYTG 1 1 GAF 
NCQGGGWCRETRRNQCFSQCYNTLTATTNPKDYEWNSGNNP I SYEKVEEF 
ALFLSQSKKLVLSGPNDDLEITLEPFKFELITYSPYYTIEGSSYQFAPIG 
LVNMLNTSGAIRSLVYHEESVEIGVRGAGEFRYYASRKPASCKIDGEVYE 
FGYEESMYMYQVPWSAPEGLSS I KYEF 
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PsRFS: 

MAPPSITKTATQQDVISTVDIGNSPLLSISLDQSRNFLVNGHPFLTQVPP 
NITTTTTSTPSPFLDFKSNKDTIANNNNTLQQQGCFVGFNTTEAKSHHVV 
PLGKLKG I KFTS I FRFKVWWTTHWVGTNGHELQHETQ I L I LDKN I SLGRP 
YVLLLP I LENSFRTSLQPGLNDYVDMSVESGSTHVTGSTFKACLYLHLSN 
DP YRL VKEAVKV I QTKLGTFKTLEEKTPPS 1 1 EKFGWCTWDAFYLKVHPK 
GVWEGVKALTDGGCPPGFVIIDDGWQSISHDDDDPVTERDGMNRTSAGEQ 
MPCRL 1 KYEENYKFREYENGDNGGKKGLVGFVRDLKEEFRS VESVYVWHA 
LCGYWGGVRPKVCGMPEAKVVVPKLSPGVKMTMEDLAVDK I VENGVGLVP 
PNLAQEMFDGIHSHLESAGIDGVKVDVIHLLELLSEEYGGRVELAKAYYK 
ALTSSVNKHFKGNGVIASMEHCNDFFLLGTEAISLGRVGDDFWCCDPSGD 
PNGTYWLQGCHMVHCAYNSLWMGN'F I HPDWDMFQSTHPCAEFHAASRA I S 
GGPVYVSDCVGNHNFKLLKSFVLPDGS I LRCQHYALPTRDCLFEDPLHNG 
KTMLKIWNLNKYAGVLGLFNCQGGGWCPETRRNKSASEFSHAVTCYASPE 
D I EWCNGKTPMD I KGVDVFAVYFFKEKKLSLMKCSDRLEVSLEPFSFELM 
TVSPLKVFSKRL I QFAP I GLVNMLNSGGAVQSLEFDDSASLVK 1 GVRGCG 
ELSVFASEKPVCCK I DGVSVEFDYEDKMVRVQ 1 LWPGSSTLSLVEFLF 



Aj-05: 

MAPSFKNGGSNVVSFDGLNDMSSPFAIDGSDFTVNGHSFLSDVPENIVAS 
PSPYTS IDKSPVSVGCFVGFDASEPDSRHVVS IGKLKDIRFMS IFRFKYW 
WTTHWVGRNGGDLESETQ I V I LEKSDSGRPYVFLLP I VEGPFRTS I QPGD 
DDFVDVCVESGSSKVVDASFRSMLYLHAGDDPFALVKEAMK I VRTHLGTF 
RLLEEKTPPGIYDKFGWCTWDAFYLTVHPQGYIEGVRHLVDGGCPPGLVL 
IDDGWQSIGHDSDPITKEGMNQTYAGEQMPCRLLKFQENYKFRDYYNPKA 
TGPRAGQKGMKAF I DELKGEFKTYEHYYVWHALCGYWGGLRPQVPGLPEA 
RY I QPYLSPGLQMTMEDLAYDK I YLHKYGLYPPEKAEEMYEGLHAHLEKV 
G I DGYKI DY I HLLEMLCEDYGGRYDLAKAYYKAMTKS I NKHFKGNGY I AS 
MEHCNDFMFLGTEA 1 SLGRYGDDFWCTDPSGDPNGTFWLQGCHMVHCAND 
SLWMGNF IHPDWDMFQSTHPCAAFHAASRA I SGGP I YYSDSYGKHNFDLL 
KKLYLPDGS I LRSEYYALPTRDCLFEDPLHNGETMLK I WNLNKFTGY I GA 
FNCQGGGWCRETRRNQCFSQYSKRYTSKTNPKDIEWHSGENPISIEGYKT 
FALYLYQAKKL ILSKPSQDLDI ALDPFEFEL ITYSPYTKL I QTSLHFAP I 
GLYNMLNTSGAIQSYDYDDDLSSYEIGYKGCGEMRYFASKKPRACRIDGE 
DYGFKYDQDQMYYVQYPWPIDSSSGGISYIEYLF 



HvSIP: 

MTYTPQ I TYGDGRLAVRGRTYLSGYPDNVTAAHAAGAGLVDGAFYGATAA 
EAKSHHYFTFGTLRDCRFMCLFRFKLWWMTQRMGTSGRDYPLETQF I L I E 
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VTAAAGNDDGDSSDGDSEPVYLVMLPLLEGQFRTVLQGNDQDELQICIES 
GDKAVETEQGMNNVYVHAGTNPFDT I TQAVKAVEKHTQTFHHREKKTVPS 
FVDWFGWCTWDAFYTDVTADGVKQGLRSLAEGGAPPRFL 1 1 DDGWQQ I GS 
ENKDDPGVAVQEGAQFASRLTG I RENTKFQSEHNQEETPGLKRLVDETKK 
EHGVKSVYVWHAMAGYWGGVKPSAAGMEHYEPALAYPVQSPGVTGNQPDI 
VMDSLSVLGLGLVHPRRVHRFYDELHAYLAACGVDGVKVDVQN I VETLGA 
GHGGRVALTRAYHRALEASVARNFPDNGCISCMCHNTDMLYSAKQTAVVR 
ASDDFYPRDPASHTVHISSVAYNTLFLGEFMQPDWDMFHSLHPAAEYHGA 
ARA I GGCP I YVSDKPGNHNFDLLRKLVLPDGS VLRAQLPGRPTRDCLFSD 
PARDGASLLKIWNMNKCAGVVGVFN'CQGAGWCRVAKKTRIHDEAPGTLTG 
SVRAEDVEAIAQAAGTGDWGGEAVVYAHRAGELVRLPRGATLPVTLKRLE 
YELFHVCP VRAVAPGVSFAP I GLLHMFNAGGAVEECTVETGEDGNAVVGL 
RVRGCGRFGAYCSRRPAKCSVDSADVEFTYDSDTGLVTADVPVPEKEMYR 
CALEIRV 



AniSTS: 

MAPPYDP I P I P I PMSA I LNFLSSTVKDNSFELLDGTLSVKNVP I LTD I PS 
NVSFSSFSSIVQSSEAPVPLFQRAQSLSSSGGFLGFSQNEPSSRLMNSLG 
KFTDRDFVS I FRFKTWWSTQWVGTTGSD 1 QMETQW I MLDVPE I KS Y AVVV 
P I VEGKFRS ALFPGKDGH 1 L I GAESGSTKVKTSNFDA I AYVHVSENP YTL 
MRDAYTAVRYHLNTFKL I EEKSAPPLVNKFGWWTWDAFYLTVEPAG I YHG 
VQEFADGGLTPRFL 1 1 DDGWQS I NNDDNDPNEDAKNLVLGGTQMTARLHR 
LDECEKFRKYKGGSMSGPNRPPFDPKKPKLL I SKA I E I EVAEKARDKAAQ 
SGYTDLARYEAEIEKLTKELDQMFGGGGEETSSGKSCSSCSCKSDNFGMK 
AFTKDLRTNFKGLDDIYVWHALAGAWGGVRPGATHLNAKIVPTNLSPGLD 
GTMTDLAVVKIIEGSTGLVDPDQAEDFYDSMHSYLSSYGITGYKVDVIHT 
LEY I SEDYGGRYELAKAYYKGLSKSLAKNFNGTGL I SSMQQCNDFFLLGT 
EQ I SMGRYGDDFWFQDPNGDPMGYYWLQGYHM I HCAYNSMWMGQF I QPDW 
DMFQSDHPGGYFHAGSRAICGGPYYYSDSLGGHNFDLLKKLYFNDGTIPK 
C I HFALPTRDCLFKNPLFDSKT I LK I WNFNKYGGY I GAFNCQGAGWDPKE 
QRIKGYSQCYKPLSGSVHYSGIEFDQKKEASEMGEAEEYAVYLSEAEKLS 
LATRDSDP I K I T I QSSTFE I FSFYP I KKLGEGVKFAP I GLTNLFNAGGT I 
QGLYYNEG I AK ] EYKGDGKFLAYSSYYPKKAYVNGAEKYFAWSGNGKLEL 
DITWYEECGGISNYTFVY 



PsSTS-1 : 

MAPPLNSTTSNL 1 KTES IFDLSERKFKYKGFPLFHDYPENVSFRSFSS I C 
KPSESNAPPSLLQKYLAYSHKGGFFGFSHETPSDRLMNS I GSFNGKDFLS 
IFRFKTWWSTQWIGKSGSDLQMETQWILIEVPETKSYYYI IPI lEKCFRS 
ALFPGFNDHYKIIAESGSTKYKESTFNSIAYYHFSENPYDLMKEAYSAIR 
YHLNSFRLLEEKTIPNLYDKFGWCTWDAFYLTYNPIGIFHGLDDFSKGGY 
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EPRFVIIDDGWQSISFDGYDPNEDAKNLVLGGEQMSGRLHRFDECYKFRK 
YESGLLLGPNSPPYDPNNFTDLILKGIEHEKLRKKREEAISSKSSDLAEI 
ESK I KKVVKE I DDLFGGEQFS SGEKSEMKSEYGLKAFTKDLRTKFKGLDD 
VYVWHALCGAWGGVRPETTHLDTKIVPCKLSPGLDGTMEDLAVVEISKAS 
LGLVHPSQANELYDSMHSYLAESGITGVKVDVIHSLEYVCDEYGGRVDLA 
KVYYEGLTKS I VKNFNGNGM I ASMQHCNDFFFLGTKQ I SMGRVGDDFWFQ 
DPNGDPMGSFWLQGVHM I HCSYNSLWMGQM 1 QPDWD.MFQSDHVCAKFHAG 
SRA I CGGP I YVSDNVGSHDFDL I KKL VFPDGT I PKCI YFPLPTRDCLFKN 
PLFDHTTVLK I WNFNKYGGV I GAFNCQGAGWDP IMQKFRGFPECYKP I PG 
TVHVTEVEWDQKEETSHLGKAEEYVVYLNQAEELSLMTLKSEP I QFT I QP 
STEEL YSFVPVTKLCGG I KFAP I GLTNMFNSGGTV I DLEYVGNGAK I KVK 
GGGSFLAYSSESPKKFQLNGCEVDFEWLGDGKLCVNVPWIEEACGVSDME 
IFF 



PsSTS-2: 

MAPPLNSTTSNL I KTES I EDLSERKFKVKGFPLFHDVPENVSERSFSS I C 
KPSESNAPPSLLQKVLAYSHKGGFFGFSHETPSDRLMNSLGSFNGKDELS 
IFRFKTWWSTQWIGKSGSDLQMETQWILIEVPETKSYVVI IPIIEKCFRS 
ALFPGFNDHVK 11 AESGSTKVKESTFNS I AYVHFSENPYDLMKEAY I A I R 
VHLNSFRLLEEKTIPNLVDKFGWCTWDAFYLTVNPIGIFHGLDDFSKGGV 
EPRFV 1 1 DDGWQS I SFDGCDPNEDAKNL VLGGEQMSGRLHRFDECYKFRK 
YESGLLLGPNSPPYDPKKFTDLILKGIEHEKLRKKREEAISSKSSDLAEl 
ESKIKKYVKEIDDLFGGEQFSSVEKSEMKSEYGLKAFTKDLRTKFKGLDD 
VYVWHALCGAWGGVRPETTHLDTKEVPCKLSPGLDGTMEDLAVVEISKAS 
LGLVHPSQANELYDSMHSYLAESG I TGVKVDV I HSLEYVCDEYGGRVDLA 
KVYYEGLTKS I VKNFNGNGM I ASMQQCNDFFFLGTKQ I SMGRVGDDFWFQ 
DPNGDPMGSFWLQGVHM I HCSYNSLWMGQM I QPDWDMFKSDHVCAKFHAG 
SRAI CGGP I YVSDNVGSHDFDL I KKLVFPDGT IPKC I YFPLPTRDCLFKN 
PLFDHTTLLK I WNFNKYGGV I GAFNCQGAGWDP I MQKFRGFPECYKP I PG 
TVHVTQVEWDQKEETSHFGKAEEYVVYLNQAEELCLMTLKSEP I QFT I QP 
STEEL YSFVPVTKLCGG I KFAP I GLTNMFNSGGTV I DLEYVGNGAK I KVK 
GGGSFLAYSSESPKKFQLNGCEVDFEWLGDGKLCVNVPWIEEACGVS 



SaSTS: 

MAPPNDPISSIFSPLISVKKDNAFELYGGKLSVKNVPLLSEIPSNVTFKS 
ESS I CQSSGAPAPLYNRAQSLSNCGGFLGFSQKESADSVTNSLGKFTNRE 
FVS I FRFKTWWSTQWVGTSGSD I QMETQW 1 MLNLPE I KSYAVV 1 P I VEGK 
FRSALFPGKDGHVLISAESGSTCYKTTSFTSIAYVHVSDNPYTLMKDGYT 
AVRVHLDTEKL I EEKS APPL VNKFGWCTWDAFYLTVEPAG I WNGVKEESD 
GGFSPRFL 1 1 DDGWQS 1 N I DGQDPNEDAKNLVLGGTQMTARLHRFDECEK 
FRKYKGGSMMGPKVP YFDPKKPKLL I SKA I E I EGVEKARDKA I QSG I TDL 



SQYE I KLKKLNKELDEMFGGGG.NDEKGSSKGCSDCSCKSQNSGMKAFTND 
LRTNFKGLDDIYVWHALAGAWGGVKPGATHLNAKIEPCKLSPGLDGTMTD 
LAWK 1 LEGS I GLVHPDQAEDFYDSMHSYLSKVG 1 TGVKVDV I HTLEYVS 
E.NYGGRVELGKAYYKGLSKSLKKNFNGSGLISSMQQCNDFFLLGTEQISM 
GRVGDDFWFQDPNGDPMGVFWLQGVHM I HCAYNSMWMGQ 1 1 HPDWDMFQS 
DHCSAKFHAGSRA I CGGPVYVSDSLGGHDFDLLKKLVFNDGTI PKCIHFA 
LPTRDCLFKNPLFDSKT 1 LK I WNFNKYGGVVGAFNXQGAGWDPKEQR I KG 
YSECYKPLSGSVHVSD I EWDQKVEATKMGEAEEYAVYLTESEKLLLTTPE 
SDP I PFTLKSTTFE 1 FSFVP I KKLGQGVKFAP I GLTNLFNSGGT I QGVVY 
DEGVAKIEVKGDGKFLAYSSSVPKRSYLNGEEVEYKWSGNGKVEVDVPWY 
EECGGISNITFVF 



VaSTS; 

MAPPNDPVNATLGLEPSEKVFDLSDGKLTVKGVVLLSHVPENVTFSSFSS 
I CVPRDAPSS ILQRVTAASHKGGFLGFSHVSPSDRL I NSLGSFRGRNFLS 
IFRFKTWWSTQWVGNSGSDLQMETQWILIEVPETESYVVI IPI lEKSFRS 
ALHPGSDDHVKICAESGSTQVRASSFGAIAYVHVAETPYNLMREAYSALR 
VHLDSFRLLEEKTVPRIVDKFGWCTWDAFYLTVNPVGVWHGLKDFSEGGV 
APRFVVIDDGWQSVNFDDEDPNEDAKNLVLGGEQMTARLHRFEEGDKFRK 
YQKGLLLGPNAPSFNPETIKELISKGIEAEHLGKQAAAISAGGSDLAEIE 
LM 1 VKVREE 1 DDLFGGKGKESNESGGCCCKAAECGGMKDFTTDLRTEFKG 
LDDVYVWHALCGGWGGVRPGTTHLDSKIIPCKLSPGLVGTMKDLAVDKIV 
EGS I GLYHPHQAiN'DLYDSMHSYLAQTGVTGVK I DV I HSLEYVCEEYGGRV 
E I AKAYYDGLTNS 1 1 KNFNGSG 1 1 ASMQQCNDFFFLGTKQ I PFGRVGDDF 
WFQDPNGDPMGVFWLQGVHMIHCSYNSLWMGQIIQPDWDMFQSDHECAKF 
HAGSRAI CGGPVYVSDSVGSHDFDL I KKLVFPDGTVPKC I YFPLPTRDCL 
FRNPLFDQKTYLKIWNFNKYGGVIGAFNCQGAGWDPKGKKFKGFPECYKA 
ISCTVHYTEYEWDQKKEAEHMGKAEEYVVYLNQAEYLHLMTPVSEPLQLT 
1 QPSTFEL YNFVP YEKLGSSN I KFAP I GLTNMFNSGGT I QELEY I EKDYK 
VKVKGGGRFLAYSTQSPKKFQLNGSDAAFQWLPDGKLTLNLAWIEENDGV 
SOLA IFF 



The calculated overall sequence homologies (%) are shown in Table 2 
attached hereto. The homologies between RFSs and SIP are less than 40%. The 
homologies between RFSs and STSs are not higher than 45%. On the other hand, the 
homologies among RFSs are all 50% or higher. Thus, the homologies among RFSs 
are higher than those homologies between RFSs and SIP and between RFSs and STSs. 

A molecular phylogenic tree of the RFSs, SIP and STSs shown in Table 1 is 
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drawn in Figure 1 attached hereto, The molecular phylogenic tree is drawn by the 
UPGMA method using the gene analysis software GENETYX-SV/RC for Windows 
version 6,1.0 (GENETYX Corporation; http://www,sdc,co.jp/genetyx/) with default 
parameters. In the molecular phylogenic tree, RFSs, SIP and STSs form different 
groups respectively. 

In summary, Table 2 and Figure 1 show that RFSs, SIP and STSs can be 
distinguished from one another based upon a comparison of their amino acid 
sequences. 



2) Attached Table 3 shows the identities obtained using the BLAST program for 
the amino acid sequences of RFSs, SIP and STSs shown in Table 1. Among Sc-02, 
Sc-03, Sc-04 and Sc-05, the identities were obtained by searching the "patent database" 
provided by NCBI (National Center for Biotechnology Information) with default 
parameters, using the amino acid sequence of each protein as the "query", and using 
"Protein query vs. translated database (tblastn)" of the NCBI BLAST program. Also, 
other identities were obtained by searching the "non-redundant database" provided by 
NCBI with default parameters, using the amino acid sequence of each protein as the 
"query", and using "Protein-protein BLAST (blastp)" of the NCBI BLAST program. 
The above-identified amino acid sequences of the RFSs, SIP and STSs are used as the 
"query" except that the amino acid sequence of Sc-04 used as the "query" is as follows: 

Sc~04 (full-length): 

MAPSISKTYELNSFGLVNGNLPLSITLEGSNFLANGHPFLTEVPENIIVT 
PSPIDAKSSKNNEDDDVYGCFVGFHADEPRSRHVASLGKLRGIKFMSIFR 
FKVWWTTHWVGSNGHELEHETQMMLLDKNDQLGRPFVLILPILQASFRAS 
LQPGLDDYVDVCMESGSTRVCGSSFGSCLYYHVGHDPYQLLREATKVVRM 
HLGTFKLLEEKTAPV II DKFGWCTWDAFYLKVHPSGVWEGVKGLYEGGCP 
PGMYL I DDGWQA I CHDEDP I TDQEGMKRTSAGEQMPCRLYKLEENYKFRQ 
YCSGKDSEKGMGAFYRDLKEQFRSYEQYYVWHALCGYWGGYRPKYPGMPQ 
AKVYTPKLSNGLKLTMKDLAYDKIYSNGYGLVPPHLAHLLYEGLHSRLES 
AG I DGYKVDY 1 HLLEMLSEEYGGRYELAKAYYKALTAS YKKHFKGNGY I A 
SMEHCNDFFLLGTEAIALGRYGDDFWCTDPSGDPNGTYWLQGCHMVHCAY 
NSLWMGNFIQPDWD.MFQSTHPCAEFHAASRAISGGPYYYSDCVGKHNFKL 
LKSLALPDGTILRCQHYALPTRDCLFEDPLHDGKTMLKIWNLNKYTGVLG 



LFNCQGGGWCPVTRRNKSASEFSQTVTCLASPQD I EWSNGKSP I C I KGMN 
VFAVYLFKDHKLKLMKASEKLEVSLEPFTFELLTVSPVIVLSKKLIQFAP 
IGLVNMLNTGGAIQSMEFDNHIDVVKIGVRGCGEMKVFASEKPVSCKLDG 
VVVKFDYEDKMLRVQVPWPSASKLSMVEFLF 



As shown in Table 3, the identities between RFSs and SIPs are about 40%. 
The identities between RFSs and STSs range from about 40% to about 50%. On the 
other hand the identities among RFSs are 60% or higher. The identities among STSs 
are also 60% or higher. That is, the identities among RFSs or the identities among 
STSs are higher than the identities between RFSs and SIP or the identities between 
RFSs and STSs. Thus, RFSs, SIP or STSs can be distinguished based on the results 
of analysis using BLAST program. 



3) Attached Table 4 shows the identities obtained using another BLAST program 
for the amino acid sequences of RFSs, SIP and STSs shown in Table 1 . All possible 
pair-wised amino acid sequence comparison were made by the "Blast 2 Sequences" 
program from NCBI (http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html). Sequence 
identities were calculated using default parameters, program; blastp, matrix; 
BLOSUM62, open gap penalty; 11, extension gap penalty; 1, gap x_dropoff; 50, 
expect; 10.0, and word size; 3. The amino acid sequences of the RFSs, SIP and STSs 
used to calculate sequence identities are identical to those used as the "query" to obtain 
identities shown in Table 3. Results were essentially the same with former two types 
of comparison. 



4) In conclusion, raffinose synthases (RFSs), seed imbibition protein (SIP) and 
stachyose synthases (STSs) were clearly distinguished from one another based on 
comparison of their amino acid sequences. 



7. 



I declare further that all statements made herein of my own 



knowledge are true and that all statements made on information and belief are believed 
to be true; and further that these statements were made with the knowledge that willful 
false statements and the like so made are punishable by fine or imprisonments, or both, 
under Section 1001 of Title 18 of the United States Code and that such willful false 
statements may jeopardize the validity of the above-identified application or any patent 
issued thereon. 
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Fig. 1 
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The Cellulose Synthase Superfamily^ 
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The availability of a nearly complete genome se- 
quence for Arabidopsis has created many novel op- 
portunities to identify, by computational methods, 
the genes that encode enzymes, which have been 
difficult to characterize by conventional means. We 
have used this approach to identify a large family of 
genes of unknown function that show sequence sim- 
ilarity to cellulose synthase. Our working hypothesis 
is that these genes encode enzymes that catalyze the 
synthesis of non-cellulosic polysaccharides (Cutler 
and Somerville, 1997). 

A recent breakthrough in research concerning the 
biogenesis of plant cell walls was the identification, 
by genomic methods, of genes encoding cellulose 
synthase in cotton fibers (Pear et al., 1996; Delmer, 
1999). The cotton cellulose synthase genes, now 
termed CesAl and CesAl, were identified in a collec- 
tion of expressed sequence tag (EST) sequences on 
the basis of weak sequence similarity to genes for 
cellulose synthase from bacteria. In addition, the 
genes were expressed at high levels in cotton fibers at 
the onset of secondary wall synthesis and a purified 
fragment of one of the corresponding proteiris was 
shown to bind UDP-Glc, the proposed substrate for 
cellulose biosynthesis. The conclusion that the cotton 
CesA genes are cellulose synthases is supported by 
results obtained with two cellulose-deficient Arabi- 
dopsis mutants, rswl (Arioli et al., 1998) and z>x3 
(Turner and Somerville, 1997; Taylor et al, 1999). The 
genes corresponding to the RSW2 and IRXi loci ex- 
hibit a high degree of sequence similarity to the cotton 
CesA genes and are considered orthologs. Ten full- 
length CesA genes have been sequenced from Arabi- 
dopsis, and there is a genome survey sequence that 
may indicate one additional family member (Fig. 1). 

It is not known at this time whether other polypep- 
tides are also required for cellulose synthase activity 
(i.e. the CesA polypeptides may be a component of a 
multisubunit enzyme complex). Until this matter is 
resolved we consider it expedient to simply refer to 
the CesA family members as cellulose synthase. The 
observation that IXi?3 (AtCesA7), which is required 
for secondary wall cellulose synthesis, is in a differ- 
ent branch of the CesA tree than RSW2 (AtCesAl), 



^ This work was supported in part by the U.S. Department of 
Energy (grant no. DOE-FG02-00ER20133). 
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which is required for primary wall synthesis (Fig. 1), 
may indicate that there is sequence divergence be- 
tween the enzymes involved in primary and second- 
ary wall synthesis. 

Reiterative database searches using the Arabidopsis 
Rswl (AtCesAl) and the cotton CesA polypeptide 
sequences as the initial query sequences revealed a 
large superfamily of at least 41 CesA-like genes in 
Arabidopsis. Based on predicted protein sequences, 
we have grouped these genes into seven clearly dis- 
tinguishable families (Fig. 1): the CesA family, which 
includes J?S W2 and ZJ^X3 {MCesA7), and six families of 
structurally related genes of miknown fimction desig- 
nated as the "cellulose synthase-like" genes {CslA, 
CslB, CslC, CslD, CslE, and CslG). The nomenclature 
for these families is still under discussion (http:// 
mbclserver.rutgers.edu/CPGN/CelluloseWeb/CesA. 
proposal.html), so the Csl designation for these genes 
should be considered temporary and may be revised 
as the enzymatic function of the members of each 
family is determined. 

All of the members of the cellulose synthase super- 
family appear to be integral membrane proteins, with 
three to six transmembrane domains in the carboxy 
terminal region of the protein and one or two trans- 
membrane domains in the amino terminal region. It 
is thought that the CesA proteins are located in the 
plasma membrane (Delmer, 1999). If the Csl proteins 
participate in the synthesis of non-cellulosic polysac- 
charides, they would be expected to be located in the 
Golgi apparatus. Preliminary analysis of CslB, CslG, 
and CslE fusions to green fluorescent protein appear 
to localize to the Golgi (T. Richmond and C. Somer- 
ville, unpublished data). Also, immunolocalization 
studies with an antibody to the CslA protein indi- 
cates that this family is localized to the cytoplasm (i.e. 
the Golgi apparatus) rather than the plasma mem- 
brane (N. Sprenger and C. Somerville, unpublished 
data). 

Intron-exon organization is conserved among the 
CesA, CslB, CslG, and CslE gene families, but not 
the CslA, CslC, or CslD families (Fig. 2). However, the 
C-terminus of a subset of the CslD genes is congruent 
with this organization as well. The CslD gene family 
is the most similar of the Csl gene families to the CesA 
family (approximately 45% identical at the amino 
acid level). The gene structure for this family is un- 
usual in that the seven genes for which complete 
genomic sequence information is available have fotir 
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different patterns of intron-exon organization. Based 
on recent thinking about the evolution of intron/ 
exon structure (de Souza et al., 1998), the small num- 
ber of introns in this family, and their divergent 
nature, would seem to suggest that this gene family 
is the oldest in the cellulose synthase superfamily 
and may predate the CesA family. 

All members of the CesA family contain a putative 
LIM-like Zn-binding domain/RING finger domain in 
the N-terminal region, which is similar to several pu- 
tative plant Leu zipper transcription factors (Kawagoe 
and Delmer, 1997a, 1997b; Arioli et al., 1998). LIM 
domains are known to mediate protein-to-protein in- 
teractions (Bach, 2000), whereas RING finger domains 
are thought to play a role in ubiquitin-mediated pro- 
teolysis (Freemont, 2000). These domains may play a 
role in mediating CesA function via protein partners 



or targeted degradation. All of the Csl proteins lack 
this amino terminus extension, including the CslD 
family, which contains proteins similar in size to the 
CesAs. 

Although the various CesA and Csl proteins vary 
in their degree of sequence similarity to one another 
(Table I), they share several features that have been 
proposed to be indicative of processive glycosyl- 
transferases (Saxena et al., 1995). All of the CesA and 
CsZ gene products contain a D,D,D,QxxRW motif 
(Fig. 2), which has been proposed to define the nu- 
cleotide sugar-binding domain and the catalytic site 
of these enzymes. Based on this motif, the proposed 
topology of these proteins (discussed above), and 
sequence-based classification, the various members 
of the Arabidopsis cellulose synthase superfamily ap- 
pear to belong to family 2 of the inverting nucleotide- 



Figure 2. Comparison of the gene si 
representative genes of the Arabidopsis CesA 
superfamily. Colored boxes represent exons and 
the lines connecting them denote introns. Thick 
vertical black bars indicate predicted transmem- 
brane domains as predicted by HMMTOP 
(http://www.enzim.hu/hmmtop/). Thin blue bars 
represent conserved Asp residues, and the 
thicker gray bar represents the QxxRW domain. 
Thin lines connecting different genes indicate 
conserved intron-e: 




:b 4.0 Kb 5.0 Kb 
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Table I. Identity/similarity matrix for selected members of the CesA superfamily 



Identity 



AtCesAI 
AtCslD3 
AtCslBI 
AtCslGI 
AtCslEI 
AtCslA2 
AtCslC4 



diphospho-sugar glycosyltransferases (Campbell et 
al., 1997) that synthesize repeating i jEf-glycosyl unit 
structures. To date, this family includes over 500 pu-; 
tative members, including cellulose synthase, chitin 
synthase, hyaluronan synthase, j3-l,3-glucan synthase, 
and a number of uncharacterized genes from many 
organisms (Campbell et al., 1997; http://afmb.cnrs- 
mrs.fr/~pedro/CAZY/gtf_2.hhnl). The function of the 
various Csl families is not known, but speculation is 
that they are responsible for producing some of the 
other polysaccharides found in plant cell walls and in 
secretions such as root cap or stylar mucilage (Cutler 
and Somerville, 1997). Although the D,D,D,QxxRW 
motif is thought to be indicative of processive jS-gly- 
cosyltransferases, there is no comparative sequence 
data available on processive a-glycosyltransf erases. 
Therefore we cannot rule out the possibility that some 
of these enzymes produce polysaccharides with 
a-linkages, such as rhamnogalacturonan I or rham- 
iiogalacturonan II. It is possible that linkage specific- 
ity is determined by subtle features in the active site 
of the proteins (Stasinopoulos et al., 1999) and that 
members of the Arabidopsis cellulose synthase su- 
perfamily make polysaccharides with both j3- and 
a-linkages. 

DISCUSSION 

With six families of Csl genes and six major non- 
cellulosic polysaccharides in Arabidopsis (i.e. callose, 
xyloglucan, glucuronoarabinoxylan, homogalacturo- 
nan, rhanmogalacturonan I, and rhamnogalacturo- 
nan II), it is tempting to speculate that each family is 
responsible for the biosynthesis of one of the princi- 
pal polysaccharides of the cell wall. Although we 
consider it possible that the gene superfamily de- 
scribed here encodes enzymes that catalyze the syn- 
thesis of different polymers, there is at present no 
evidence for this other than the observation that se- 
quence divergence is frequently associated with 
functional divergence. It is also possible that there 
are additional functional divisions within the gene 
families that are not evident from our analysis. Re- 
cent results concerning the relationship between en- 
zyme structure and function, such as experiments: 
showing that as few as four amino acid changes can 
alter the catalytic outcome of an enzymatic reaction: 
from desaturation to hydroxylation : (Broun n et al.,: 
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1998), emphasize the: need for caution in usin.g se-1 
quence sitnilarity to infer function based on 
sequence. 

The amount of plant genome sequence and EST 
information in the public sequence databases is ex- 
panding rapidly. At present there are more than 
900,000 plant ESTs and genome survey sequences in 
GenBank, most of which are from 35 species. In the 
first 8 months of the year 2000, more than 516,000 new 
ESTs and genome survey sequences from 16 plant 
species were deposited. Thus except for species such 
as Arabidopsis, which will soon be completely se- 
quenced, any attempt at a comprehensive compilation 
of CesA-related sequence information represents a 
continuing challenge. To facilitate research on these 
genes, we have established a website (http://cellwall. 
stanford.edu) that summarizes the ever-increasing 
number of cellulose synthase and cellulose synthase- 
like genes. At present, there are more than 1,250 CesA 
and Csl sequences, from 29 different plant species in 
GenBank. Although the most extensive information 
available is for Arabidopsis where there are more than 
330 partial or complete gene sequences, there is also a 
significant amount of information available for several 
other species, especially rice, maize, soybean, and to- 
mato. A crude estimate of the relative abundance of 




: Figure 3. Relative abundance of EST sequences for members of the 
CesA and Csl families in GenBank. 
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mRNA for the various family members can be calcu- 
lated from the frequency with which each gene family 
is represented by EST sequences in the public data- 
bases (Fig. 3). 

Polysaccharides found in other plant species, but 
not in Arabidopsis (Zablackis et al., 1995), such as 
mixed linkage xylans, mannans, or arabinans, may be 
synthesized by genes that are not represented by 
orthologs in Arabidopsis. A number of gene se- 
quences from plants in GenBank show limited simi- 
larity (<50% identity) to the members of the various 
Csl families in Arabidopsis. This and other issues will 
undoubtedly become more transparent when the 
function of the Csl genes in Arabidopsis is known 
from direct experimental evidence. Our laboratory, 
along with others, is examining the patterns of gene 
expression and protein localization of the Arabidop- 
sis Csl genes, and attempting to characterize their 
enzymatic function using reverse genetics. We are 
confident that in the next several years the function 
of these genes will be understood and it will then be 
possible to begin to unravel the challenge of under- 
standing how cell wall composition and deposition is 
controlled. 

Received May 25, 2000; accepted July 7, 2000. 
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Abstract 

Acetolactate synthase catalyses the first step in branched-chain amino acid biosynthesis. The bacterial enzyme contains two 
large and two small subunits but there is only limited and circumstantial evidence for a small subunit in the eukaryotic enzyme. 
Here this evidence is summarised and protein sequences of two putative eukaryotic small subunits. from a yeast and a red alga, 
are presented. © Elsevier Science B.V. All rights reserved. 

Keywords: Branched-chain amino acid; Chloroplast genome; Herbicide; Yeast genome 



1. Introduction 

Acetolactate synthase (ALS) is an essential enzyme 
in plants and many microorganisms because it catalyses 
the first step in the biosynthesis of branched-chain amino 
acids. In some bacteria it also plays a catabolic role, 
supplying acetolactate for the butanediol fermentation. 

There appear to be two distinct forms of the enzyme 
that correspond to these functional roles. The anabolic 
enzyme contains FAD (Schloss et al., 1985) and is 
inhibited by the branched-chain amino acids ( Weinstock 
et al., 1992) while the catabolic enzyme, sometimes 
referred to as the 'pH 6 acetolactate-forming enzyme', 
displays neither of these properties (StSrmer, 1968; Peng 
et al., 1992). A further property of the anabolic enzyme 
is that it is inhibited by a number of compounds that 
are used as herbicides (Schloss et al., 1988). The remain- 
der of this article concerns the anabohc enzyme only. 

Many of the bacterial ALSs have been shown to be 
heterotetramers composed of two types of subunit, large 
and small. The latter subunit was first identified (Squires 
et al., 1983) for Escherichia coli isoenzyme III (ALSIII); 
DNA sequencing revealed an open reading frame that 
appeared to have a homologue in the operon that 

+ Corresponding author. Tel.: +61 7 .■13654615: Fax: +61 7 33654699; 
e-mail: duggleby@biosci.uq.edu.au 

Abbreviations: ALS, acetolactate synthase. 



contains the gene for E. coli ALSII (Lawther et al., 
1981). The protein product of the small subunit gene 
was later identified for E. coli ALSI (Eoyang and 
Silverman, 1984) and Salmonella typhimuriutn ALSII 
(Schloss et al., 1985). 

The role of the small subunit is not entirely clear and 
it may be that it is involved in more than one way. For 
the various E. coli isoforms it has been shown that this 
subunit affects sensitivity to branched-chain amino acids 
(Eoyang and Silverman, 1986; Sella et al., 1993), specific 
activity (Lu and Umbarger, 1987), stability (Sella et al., 
1993) and the kinetic properties ( Weinstock et al., 1992). 

Putative small subunit genes have been identified for 
a number of other bacterial species. This identification 
has been based mainly, and in most cases solely, on the 
presence of an open reading frame 3' to the large subunit 
gene. In contrast, the presence of a small ALS subunit 
has never been demonstrated unequivocally in eukary- 
otes. Certainly no open reading frame nearby the large 
subunit gene has been identified but this is not surprising 
since operons are not a feature of eukaryotic genomes. 
However, there is some evidence that a small subunit 
may exist. 

First, purified wheat ALS contains a low molecular 
weight component (Southan and Copeland, 1996) that 
could be a small subunit; on the other hand, it could be 
simply an impurity. Purified barley ALS has been 
reported to contain no small subunit (Durner and Boger, 
1 988 ) on the basis of SDS-PAGE. However, it is conceiv- 
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Bfl MANSDVTRHILSVLVQDVDGIISRVSGMFTRRAFNLVSLVSAKT 44 

Ccr MTANVQPAPASAYDLSPKDQAEQSATFALLVDNEPGVLHRWGLFAARGYNIESLTVAET 60 

Cgl MAHSDVTRHILSVLVQDVDGIISRVSGMFTRHAFNLVSLVSAKT 44 

EcoH MRRILSVLLENESGALSRVIGLFSQRGVNIESLTVAPT 38 

ECOM MQHQVNVSARFNPETLERVLRWRHRGFHVCSMNMAAA 38 

EOON MQNTTHDMVILELTVRNHPGVMTDVCGLFARRATNVEGILCLPI 44 

Hav MSPQTHTLSVIVEAKPGVLARVAALPSBRGFNIESLAVGAT 4 1 

Sav HSKHTLSVLVEHKPGVLARITALFSRRGFNIDSLAVGVT 39 

Sty MRRILSVLLEHESGALSRVIGLFSORGYMIESLTVAPT 38 



Bfl E-THGINKITVVVD-ADEUIIEQiaTCQIJOCLIPVIjrmiLDEETT-IARAIHLVKVSADS 101 

Ccr DRKAHTSRITVVTR-GTRHVIJ)QIEAQLNKVVNVRRVHDVTRDPNGVEREIaALV^ 119 

Cgl E-THGIMRITVWD-ADELmEQITKQLNKLIPVLKWRLDEETT-IAHAIMLVKVSADS 101 

EooH D-DPTLSRMTIQTV-GDEKVLEQIEKQLHKLVDVLRVSELGQGAH-VEREIMLVKIQASG 95 

EcoM S-DAQNINIELTVA~SPRSVDLLPSQLNKLVDVAHVAICQS'm~SQQIRA 86 

ECON Q-DSDKSHIWLLVH—DDQRIiEQMISQIDKLEDWKVQRNQSDPTMPNKIAVFFQ 96 

Mav E-QKDMSRMTIWS-AEETPLEQITKQLNKLINVIKIVELEDGHS-VSRELALIKVRADA 98 

Sav E-HPDISRITIVVHVIEALPLEQVTKQLNKLVNVLKIVELEPSAGRAGGELVLVKVRADN 98 

Sty D-DPTLSRMTIQTV-GDEKVLEQIEKQLHKLVDVLRVSELGQGAH-VEREIKLVKMEASG 95 



Bfl TNRPQIVDAANIFRARWDVAPDSWIESTGTPGKLRALLDVMEPFG-IRELIQSGQIAL 160 

ccr VDRLEALRIAEIFRAKPVDTTLESFVFEISGAPSKIDKFLDLMRPLG-LVELSRTGVLSI 178 

Cgl TKRPQIVDAANIFRARWDVAPDSWIESTGTPGKLRALLDVMEPFG-IRELIQSGQIAL 160 

EcoH YGRDEVKRNTEIFRGQIIDVTPSLYTVQLAGTSGKLSAPLASIHDVAKIVEVARSGWGL 155 



EcoN 96 

Mav GTRSQVIEAVNLFRAKVIDVSPEALTIEATODRGKIEALLRVLEPSV-SVRSS-NREWCR 156 

Sav ETRSQIVEIVQLFBAKTVDVSPEAVTIEATGGSDKLEAMLKMLEPFR-HQGARQSGTIAI 157 

Sty YGREEVKRNTEIFRGQIIDVTPTLYTVQLAGTSDKLDAFLASLRDVAKIVEVARSGWGL 155 



Bfl NRGPKTMAPAKI 172 

Ccr ERGFEGM 185 

Cgl NRGPKTMAPAKI 172 

EooH SRGDKIMR 163 

EcoM 86 

EOON 96 

Mav CPGPRGIGTAK 167 

sav GRGARSITDRSLRPLDRSA 176 

Sty SRGDKIMR 163 



Fig. I. Alignment of selected ALS small subunit protein sequences. Sequences were obtained from GenBank and aligned using the ClustalW 
(Thompson et al., 1994) program. An asterisk indicates a totally conserved residue, while a full stop denotes a position where there are conservative 
substitutions. Abbreviations used are: Bfl, Brevibaclerium ftavum MJ233; Ccr, Caulohacler crescenlus; Cgl, Corynebacierium glutamicum; EcoH, 
£. coli ihH (ALSIII); EcoM. f. coli ihM (ALSII ); EcoN, E. coli ihN (ALSI); Mav, Mycobacterium avium\ Sav, Streptomyces avermitilis; Sty, 
S. tYphimurium. 



able that it could be lost during multistep purification; 
in this context it is relevant that the various E. coli 
isoforms have differing affinities for their respective 
small subunits and that, for ALSIII. the small subunit 
is readily lost (Sella et al., 1993). In addition, even when 
a small subunit is present, it is not easily observed by 
SDS-PAGE (De Rossi et al., 1995) because it migrates 
as a rapidly moving, diffuse band that stains only weakly 
with Coomassie blue. 

Second, we have confirmed (Chang and Duggleby, 
unpublished ) that expression of the Arabidopsis lhaliana 
ALS-encoding gene in E. coli results in an enzyme that, 
unlike the enzyme from the plant itself, is insensitive to 
inhibition by branched-chain amino acids (Singh et al., 
1992). The suggested explanation (Singh et al., 1992) is 
that the expressed enzyme lacks a small subunit, 
although no evidence was adduced to support this 
proposal. A number of other explanations of this obser- 
vation are possible, such as different post-translational 
processing, including proteolysis, between prokaryotes 



and eukaryotes. The plant enzyme is located in the 
chloropiast and contains an amino-terminal sequence 
that is believed to be a chloropiast transit peptide 
(Mazur et al., 1987). Although the enzyme expressed in 
E. coli is processed to a similar size as the native enzyme 
(Singh et al., 1992), it is not known whether cleavage 
of the transit peptide is at the same site as in the plant. 
Expression of the yeast enzyme in E. coli also results in 
an enzyme that is kinetically distinguishable from the 
native enzyme (Poulsen and Stougaard, 1989); this 
difference has also been ascribed to the lack of the 
appropriate small subunit. 

Third, over-expression of the A. thaliana ALS-encod- 
ing gene in tobacco (Odell et al., 1990) or oilseed rape 
(Ouellet et al., 1994) gives greatly elevated amounts of 
the corresponding mRNA, but much smaller increases 
in ALS activity. This lack of correlation could be 
interpreted to indicate that some other component, such 
as a small subunit, is limiting. 

Although none of these lines of evidence for an ALS 
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sea MLRSLLQSGHRRVVASSCATHVRCSSSSTSALAyKQHHRHATRPPLPTLDTPSWNANSAV 60 

Bfl MANSDVTRHILSVLVQDVDGIISRVSGMFTRRAFNLVSLVSAKTETHGINR 51 

ppu HKHTLSVLVQDEAGVLSRISGLFARRGFNIASLAVGPAEQIGVS- 44 

see SSIIYETPAPSRQPHKQHVLNCLVQKEPGVLSRVSGTLAARGFNIDSLWCNTEVKDLSR 120 

Bfl ITVVVDADELNIEQITKQlJnCLIPVLKVVRLDEETTIARAIHLVKVSAD 100 

Ppu ITMWQGDNRTIEQLTKQLYKLVNILNVQDVTNIPSVEHEIiMLIKIQVN 93 

see MTIVLQGQDGWEQARRQIEDLVFVyAVLDYTNSEIIKRELVMARISLLGTEYFEDLLLH 180 
* * ++ * *+ * + ++ *++ + 

Bfl STNRPQIVD AANIFRARWDVA 122 

ppu SQNRIEALEVK IFRAHWDIA 114 

see HHTSTNAGAADSQELVAEIREKQPHPAHLPASEVLRLKHEHLNDITNLTNNFGGRWDIS 240 
*+ + * ***+ 

Bfl PDSWIESTGTPGKLRALLDVMEPFGIRELIQSGQIALNRGPK—TMAPAKI 172 

ppu EDLllVEVTGDPGKIVAIEQLLTKPGIIEIARTflKISLVRTSKINTyiiKDKWAyHA 171 

See ETSCIVELSAKPTRISAFLKLVEPFGVLECARSGMMALPRTPLKTSTEEAADEDEKISEM 300 



see VDISQLPPG 309 

Fig. 2. Alignment of a bacterial and two putative eukaryotic ALS small subunit protein sequences. The B. flavum MJ233 (Bfl), P. purpurea (Ppu) 
and 5. cerevisiae (See) sequences were obtained from GenBank and aligned using the ClustalW (Thompson el al., 1994) program. An asterisk 
indicates a totally conserved residue, while a plus denotes a position where the two eukaryotic sequences are identical. The underlined segment 
represents the most highly conserved region. 



small subunit in eukaryotes is alone convincing, taken 
together they suggest that further work is merited. 
Recent advances in genome sequencing have provided 
an opportunity to look for ALS small subunit genes. 
Here the presence of such genes in two eukaryotic 
species is reported. 



2. Results and discussion 

There is limited similarity between the protein 
sequences of known ALS small subunits as illustrated 
in Fig. 1, which shows an alignment of several such 
sequences. In all, only four residues are totally conserved 
and a further 17 positions show conservative substitu- 
tions. A consensus sequence was derived from this 
alignment and compared to the individual sequences. 
The B. flavum and C. glutamicum sequences are most 
similar to the consensus and the former was used as a 
representative small subunit sequence. Various databases 
in GenBank were searched for conceptual translations 
into protein sequences that are similar to this B. flamm 
small subunit, using the BLAST program. In addition 
to known bacterial ALS small subunit genes, this search 
identified two well-matched eukaryotic genes: one (with 
a probability of arising by chance of 8.2 x 10"''^) from 
the chloroplast genome of the red alga Porphyra pur- 
purea (Reith and Munholland, 1995) and the other 
{P=l.l from chromosome III of the yeast 

Saccharomyces cerevisiae (Oliver et al.. 1992). The next 
best match (Z'^l.O) corresponded to a fragment of a 



mouse lipoxygenase gene (Chen et al., 1994), mis- 
translated in a reverse reading frame. The S. cerevisiae 
and P. purpurea sequences are shown in Fig. 2, aligned 
with that of the ALS small subunit of B. flavum. These 
results clearly indicate that S. cerevisiae and P. purpurea 
contain a gene that could encode an ALS small subunit. 
A total of 38 residues are identical in all three sequences 
and, when only the two eukaryotes are compared, there 
are 66 identities. The most highly conserved region is a 
26 residue sequence near the amino terminus, with the 
motif LVQXXXG<t)4>SR<t)SGXXXXRXFN(j)XSL, where 
X is any amino acid while (j) is one of V, L or I. 

The S. cerevisiae sequence is substantially longer than 
the others; compared to that of P. purpurea, there are 
an additional 50 residues in the middle of the protein, 
a short extension at the carboxyl-terminus, and a 75 
residue extension at the amino-terminus. It is suggested 
that the role of the latter is to act as a mitochondrial 
transit peptide, since it is known that 5. cerevisiae ALS 
is a mitochondrial enzyme (Ryan and Kohlhaw, 1974), 
The transit peptides of other mitochondrial proteins 
frequently contain an arginine residue at position -2 
relative to the cleavage site (von Heijne et al., 1989) and 
it is noted that this putative S. cerevisiae ALS small 
subunit contains an arginine at position 75, close to 
where homology with the P. purpurea sequence begins. 
Thus, it is proposed that the cleavage site is immediately 
after K76. It has also been noted (von Heijne et al., 
1989) that mitochondrial transit peptides are enriched 
in A. L, R and S, but deficient in D and E. Similar 
characteristics are observed here, most notably for S 
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(which constitutes 18.4% of the first 76 residues but only 
7.3% of the remaining 233 residues) and E (1.3% 
versus 9.4%). 

Unlike ALS large subunits from plants (Mazur et al., 
1987), the proposed P. purpurea ALS small subunit 
does not contain a chloroplast transit sequence. 
However, this is not necessary as the gene is located in 
the chloroplast genome. Thus it is suggested that in this 
plant, the large subunit is synthesised in the cytoplasm 
and transported to the chloroplast where it associates 
with the chloroplast-encoded small subunit. This 
arrangement is very similar to the situation often 
observed for ribulose 1,6-iisphosphate carboxylase, 
except that in that case it is the larger of the two 
subunits that is encoded by the chloroplast genome 
(Spreitzer, 1993). 

Finding what appears to be an ALS small subunit 
gene in two eukaryotes as diverse as a yeast and a red 
alga suggests that small subunit genes will exist in other 
plants and fungi. However, the location of this gene, as 
well as that for the large subunit, may be variable. For 
example, it has been shown that in another red alga, P. 
umbilicus, an ALS large subunit is encoded by a chloro- 
plast gene (Reith and Munholland. 1993). Further, the 
location of the P. purpurea ALS small subunit gene in 
the chloroplast may be unusual. We have searched for 
this gene in the complete chloroplast genomes of five 
other plants: Nicotiana tahacum (Shinozaki et al., 1986), 
Oryza saliva (Hiratsuka et al., 1989), Pinus thunbergii 
(Tsudzuki et al., 1992), Marchantia polymorpha 
(Ohyama et al., 1986) and Odontella .v/«enj(5 ( Kowallik 
et al., 1995). A total of 608 open reading frames were 
examined but the best match with the motif mentioned 
previously contained only 8 of the 1 7 conserved residues 
and bore no overall similarity to ALS small subunits; 
in contrast, the three sequences in Fig. 2 match in all 17 
positions. 

Because ALS is the target for several herbicides 
(Schloss et al., 1988), there has been considerable inter- 
est in transforming crop plants with herbicide-resistant 
forms of the enzyme (Odell et al., 1990: Ouellet et al., 
1994). The success of this procedure is likely to be 
limited if a small subunit is an essential component of 
the plant enzyme. Thus, the work reported here may 
have significant practical implications. At present, there 
is no evidence that ALS small subunit genes exist in any 
eukaryotic species apart from S. cerevisiae and P. pur- 
purea, or that even in these species the genes are actually 
expressed. Indeed, it is possible that these two genes 
serve an entirely different function that is unrelated to 
ALS activity. Ultimately the function of any DNA 
sequence, whose identity is based solely on homology, 
can only be proven by experiments designed to evaluate 
that function. In the case of these putative eukaryotic 
ALS small subunit genes, their function might be demon- 
strated by gene disruption or by co-expression with the 



large subunit genes. Current studies in this laboratory 
are examining these possibilities. 



3. Note added in proof 

Recent examination of GenBank expressed sequence 
tags has identified three sequences (two from A. thaliana 
and one from rice) that may represent higher plant ALS 
small subunits. The last of these gives a very good match 
to the P. purpurea sequence; over residues 83-154 there 
are 46 identical, and 10 similar, amino acids. This EST 
is apparently encoded in the nuclues, as it is not present 
in the rice chloroplast genome. 
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Deciphering the Message in Protein Sequencesi 
Tolerance to Amino Acid Substitutions 

James U. Bowie/ John F. Reidhaar-Olson, Wendell A Lim 
Robert T. Sauer 



An amino acid sequence encodes a message that deter- 
mines the shape and function of a protein. This message is 
highly degenerate in that many different sequences can 
code for proteins with essentiaUy the same structure and 
activity. Comparison of different sequences with similar 
messages can reveal key features of the code and improve 
understanding of how a protein folds and how it per- 
forms its function. *^ 



T 

function and 



HE GENOME IS MANIFEST LARGELY IN THE SET C 

teins tliat it cnmdes. It is tlie ability of tliese proteins 

unique three-dimensional structures that allows tiiem tc 



t the 



of the genome. Thus, 



comprehending the rules diat relate amino acid sequence 
ture is fimdamcntal to an understanding of biological processes 
Because an amino acid sequence contains all of die information 
necessary to determine the structure of a protein (?), it should be 
possible to predict structure from sequence, and subsequendv to 
infer detailed aspects of fianction from die structure. However, both 
problems arc extremely complex, and it seems unlikely that either 
will be solved in an exact manner in die near flinire. It may be 
possible to obtain approximate solutions by using experimental data 
to simplify' the problem. In this article, we describe how an analysis 
of allowed amino acid subsdtutions in proteins can be used to 
reduce the complexity of sequences and reveal important aspects of 
• e and function. 



Methods for Studying Tolerance to 
Sequence Variation 

There are two main approaches to studying the tolerance of an 
ammo acid sequence to change. The first method relies on the 
process of evolution, in which mutations are either accepted or 
rejected by natural selection. This method has been extremely 
powerful for proteins such as die globins or cytochromes, for which 
sequences from many diEFerent species are known (2-7). The second 
approach uses genetic mediods to introduce amino acid changes at 



t used have revealed that 
ions {2-4, 



•Present address: Dcpajtiiicnr of Chemistry and Biochcmistr)' and the Molecu) 
Biolog)' Institute, University of CaJifornia, Los Aiigeles, Los Atigdcs, CA 90024. 



specific positions in a cloned gene and uses selections or screens to 
identify functional sequences. This approach has been used to great 
advantage for proteins that can be expressed in bacteria or yeast 
where die appropriate genetic manipulations arc passible {^, 8-11). 
The end results of both mediods are lists of active sequences diat can 
be compared and analyzed to identify sequence features diat are 
essential for folding or function. If a particular propert)' of a side 
chain, such as charge or size, is important at a given position, only 
side chains that have die required property- will be allowed. Con- 
versely, if die chemical identity of the side chain is unimportant 
dien many different substitutions will be permitted. 

Studies in which these method.s w 
proteins are surprisingly tolerant of ai 

U). For example, in studying the effects of appro 

single amino acid substitutions at 142 positions in lac repressor^ 
Miller and co-workers found thar about one-half of ail subsdturions 
were phenotypically silent (It). At some positions, many different 
nonconservative substitudons were allowed. Such residue positions 
play littie or no role in .stmcmre and function. At other positions, no 
substitudons or only conscrvarive subsritutions were allowed. These 
residues are the most important for lac repressor activity. 

What roles do invariant and conserved side chains plav in 
proteins? Residues that are directly involved in protein funcdons 
such as binding or catalysis will certainly be among the most 
conserved. For example, replacing the Asp in die catalytic triad of 
tr)'psin with Asn results in a W-fold reduction in activity (72). A 
similar loss of activity occurs in X repressor when a DNA binding 
residue is changed from Asn to Asp {13). To carry out their 
function, however, these catalytic residues and binding residiies 
must be precisely oriented in three dimensions. Consequently, 
mutations in residues that are required for structure formation or 
stability can also haye dramatic effects on activit)' {10, 14-16). 
Hence, many of die residues that are conserved in sets of related 
sequences play structural roles. 



Substitutions at Surface and Buried Positions 

In dieir inidal comparisons of the globin sequences, Pcrutz and 
co-workers found that most buried residues require nonpolar side 
chains, whereas few feanires of surface side chains are generally 
conserved {6). Similar results have been seen for a number of protein 
families (2, 4, 5, 7, 17. 18). An example of the sequence tolerance at 
surface versus buried sites can be seen in Fig. 1, which shows the 
allowed substitutions in X repressor at residue positions that are near 
die dimer interface but distant from die DNA binding surface of the 
protein (9). These substimtions were identified by a functional 



SCIENCE, VOL. 2 



Reproduced with permission of the copyright owner. Further reproduction prohibited without permi; 



Fig. 1 , (A) Amino acid substitutions allowed in a 
short region of >. repressor. Tiie wild-type se- 
quence is shown along the center line. The al- 
lowed substitutions shown above each position 
were identified by randomly mutating one to 
three codons at a time by using a cassette method 
and applying a ftmctional selection (9). (B) The 
fractional solvent accessibility (42) of the wild- 
type side chain in the protein dimcr {43) relative 
to the same atoms in an Ala-X-Ala model tripep- 
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selcaion after cassette mutagenesis. A hiscogram of side chain 
solvent accessibility in the crystal structure of the dimcr is also 
shown in Fig. 1. At six positions, only the wild-type residue or 
relatively conservative substitutions are allowed. Five of these 
positions are buried in the protein. In contrast, most of the highly 
exposed positions tolerate a wide range of chemically different side 
chains, including hydrophilic and hydrophobic residues. Hence, it 
seems that most of the structural information in this region of the 
protein is carried by the residues that arc solvent inaccessible. 



Constraints on Core Sequences 

Because core residue positions appear to be extremely important 
for protein folding or stability, we must understand the facttirs that 
dirtate whether a given core sequence will be acceptable. In general, 
only hydrophobic or neutral residues are tolerated at buried sites in 
proteins, undoubtedly because of tlic large favorable contribution of 
the hydrophobic effect to protein stability (19). For example, Fig. 2 
shows the results of genetic studies used to investigate the substitu- 
tions allowed at residue positions that form the hydrophobic core of 
die NH2-tcrminal domain of \ repressor (20), The acceptable core 
sequences are composed almost exclusively of Ala, Cys, Thr, Val, He, 
Leu, Met, and Phe. The acceptability of many different residues at 
each core position presumably reflects the faa that the hydrophobic 
effect, unlike hydrogen bonding, does not depend on specific 
residue pairings. Although it is possible to imagine a hypothetical 
core structure that is stabilized exclusively by residues forming 
hydrogen bonds and salt bridges, such a core would probably be 
difficult to construct because hydrogen bonds require pairing of 
donors and acceptors in an exact geometry. Thus die repertoire of 
possible structures diat use a polar core would probably be extreme- 
ly limited {21). Polar and charged residues are occasionally found in 
die cores of proteins, but only at positions where dieir hydrogen 
bonding needs can be satisfied (22). 

The cores of most proteins are quite closely packed {23), but some 
volume changes arc acceptable. In X repressor, the cwerall core 
volume of acceptable sequences can vary by about 10%. Changes at 
individual sites, howevet, can be considerably larger. For example, 
as shown in Fig. 2, both Phe and Ala are allowed at the same core 
position in the appropriate sequence contexts. Large volume 
changes at individual buried sites have also been obsen'ed in 



Position 

phylogenetic studies, where it has been noted diat die size decreases 
and increases at interacting residues are not necessarily related in a 
simple complementary fashion (5, 7, 17). Rather, local volume 
changes are accommodated by conformational changes in nearby 
side chains and by a variety of backbone n 



The Informational Importance of the Core 

With occasional exceptions, the core must remain hydrophobic 
and maintain a reasonable packing density. However, since the core 
is comp<wcd of side chains that can a.ssume only a limited number of 
conformations {24), efficient packing must be maintained without 
steric clashes. How important are hydrophobicity, volume, and 
steric complemenrarity in determining whether a given sequence can 
form an acceptable core? Each factor is essential in a physical sense, . 
as a stable core is probably unable to tolerate unsatisfied hydrogen 
bonding group,s, large holes, or steric overlaps (25). However, in an 
informational sense, these factors arc not equivalent. For example, in 
experiments in which three core residues of X repressor were 
mutated simultaneously, volume was a relatively unimportant infor- 
mational constraint because three-quarters of all possible combina- 
tions of the 20 naturally occurring amino acids had volumes within 
the range tolerated in the core, and yet most of these sequences were 
unacceptable (20), In contrast, of die sequences that contained onh' 



Fig. 2. Amino acid substitu- 
tions allowed in the core of \ 
repressor. The wild-type side 
chains are shown piaorially in 
the approximate orientation 
seen in the crystal stnicftire 
(«).The lists of allowed sub- 
stitutions at each position are 
shown below the wild- type 
side chains. These substitu- 
tions were identified by ran- 
domly mutating one to four 
residues at a time by using a 
cassette method and applying 
a functional selection (20). 
Not all .subsrinitions are al- 
lowed in eveiy sequence back- 
ground. 
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the appropriate hydrophobic residues, a significant fraction were 
acceptable. Hence, the hydrophobicity of a sequence contains 
more informarion about its potential acceptability in the core than 
does the total side chain volume. Steric compatibility was intermedi- 
ate beuveen volume and hydrophobicit)' in informational impor- 



The Informational Importance of Surface Sites 

We have noted that many surface sites can tolerate a \\'ide variety 
of side chains, including hydrophilic and hydrophobic residues. This 
result might be taken to indicate that surface positions contain little 
structural information. However, Bashford et «/., in an extensive 
analysis of globin sequences (4), found a strong bias against large 
hydrophobic residues at many surface positions. At one level, this 
may reflect constraints imposed by protein solubility, because large 
patches of hydrophobic surface residues would presumablv lead to 
='gB'''^g3"on- '^t a more fundamental level, protein folding requires a 
partitioning bct^veen surface and buried position.s. Consequendy, to 
achieve a unique native state without significant competition from 
other conformations, it may be imponanc that some sites have a 
decided preference for exterior rather than interior positions. As a 
result, many surface sites can accept hydrophobic residues individ- 
ually, but the surface as a whole can probably tolerate only a 
moderate number of hydrophobic side chains. 



Identification of Residue Roles from 
Sets of Sequences 

Often, a protein of interest is a member of a family of related 
sequences. What can we infer from the pattern of allowed substitu- 
tions at positions in sets of aligned sequences generated by genetic 
or phylogenetic methods* Residue positions that can accept a 
number of different side chains, including charged and highly polar 
residues, are almost certain to be on die protein surface. Residue 
positions that remain hydrophobic, whether variable or not, are 
likely to be buried within the sfnicture. In Fig. 3, diose residue 
positions in \ repressor that can accept hydrophilic side chains arc 
shown in orange and those that cannot accept hydrophilic side 
chains are shown in green. The obligate hydrophobic posirions 
define die core of the structure, whereas positions that can accept 
hydrophilic side chains define the surface. 

Funcrionally important residues should be conserved in sets of 
active sequences, but it is not possible to decide whether a side chain 
is functionally or structurally important just because it is invariant or 
coaserved. To make this distinction requires an independent assay of 
protein folding. The ability of a mutant protein to maintain a stably 
folded structure can often be measured by biophysical techniques, 
by susceptibilit)' to intracellular proteolysis [26), or by binding to 
antibodies specific for the native structure (27, 28). In the latter 
cases, it is possible to screen proteins in mutated clones for the 
ability to fold even if these proteins are inactive. Sets of sequences 
diat allow formation of a stable structure can then be compared to 
the sets that allow both folding and function, with the active site or 
binding residues being those diat are variable in die set of stable 
proteins but invariant in the set of fijnctional proteins. The DNA- 
binding residues of Arc repressor were idcndficd by this method (8). 
The receptor-binding residues of human growth hormone were also 
identified by comparing the stabilities and activities of a set of 
mutant sequences {28). However, in diis case, the mutants were 
generated as hybrid sequences between growth hormone and related 
hormones with different binding specificities. 

1308 



Implications for Structure Prediction 

At present, the only reliable method for predicting a low- 
resolution tertiary srnicrure of a new protein is by identift'ing 
sequence similarity to a protein whose stracture is already known 
(29, 30). However, it is often difficult to align sequences as'die le\ el 
of .sequence similarity decreases, and it is sometimes impossible to 
detect statistically significajit sequence similarit)' beuveen disfanrly 
related proteins. Because die number of known sequences i.s far 
greater dian die number of known structures, it would be advanta- 
geous to increase the reach of the available structural information by 
improving mediods for detecting distant sequence relations and for 
subsequently aligning diese sequences based on structural principles. 
In a normal homology search, die sequence database is scanned with 
a single test sequence, and every residue must be weighted equally. 
However, some residues are more important than odiers and should 
be weighted accordingly. Moreover, certain regions of the proR ui 
are more likely to contain gaps than odicrs. Bodi kinds of informa- 
tion can be obtained from sequence sets, and several teclniiques ha\'e 
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been used to combine such information into more appropriately 
weighted sequence searches and alignments (J7). These methods 
were used to align the sequences of retroviral proteases with aspartic 
proteases, which in turn allowed construction of a three-dimension- 
al model for the protease of human immunodeficiency virus type 1 
(29). Comparison with the recently determined cr)'stal structure of 
this protein revealed reasonable agreement in many areas of the 
predicted structure (32). 

The structural information at most surface sites is highly degener- 
ate. Except for functionally important residues, exterior positions 
seem to be important chiefly in maintaining a reasonably polar 
surface. The information contained in buried residues is also 
degenerate, the main requirement being that these residues remain 
hydrophobic, Thus, at its most basic level, the key structural 
message in an amino acid sequence may reside in its specific pattern 
of hydrophobic and hydrophilic residues. This is meant in an 
informational sense. Clearly, the precise structure and stability of a 
protein depends on a large number of detailed interactions. It is 
possible, however, diat structural prediction at a more primitive 
level can be accomplished by concentrating on the most basic 
informational aspects of an amino acid sequence. For example, 
amphipathic patterns can be extracted from aligned sets of sequences 
and used, in some cases, to identify secondary structures. 

If a region of secondary structure is packed against the hydropho- 
bic core, a pattern of hydrophobic residues reflecting the periodicit)' 
of the secondary structure is expected (33, 34). These patterns can be 
obscured in individual sequences by hydrophobic residues on the 
protein surface. It is rare, however, for a surface position ro remain 
hydrophobic over the course of evolution. Consequently, the am- 
pliipathic patterns expected for simple secondary structures can be 
much clearer in a set of related sequences (6). This principle is 
illustrated in Fig. 4, which shows helical hydrophobic moment plots 
for the Antennapedia homeodomain sequence (Pig. 4A) and for a 
composite sequence derived from a set of homologous homeodo- 
main proteins (Fig. 4B) {35). The hydrophobic moment is a simple 
measure of the degree of amphipathic character of a sequence in a 
given secondary structure (34). The amphipadiic character of the 
three a-helical regions in the Antennapedia protein (3(5) is clearly 
revealed only by the analysis of die combined set of homeodomain 
sequences. The secondary structure of Arc repressor, a small DNA- 
binding protein, was recenriy prediaed by a similar method (S) and 
confirmed by nuclear magnetic resonance studies (37). 

The specific pattern of hydrophobic and hydrophilic residues in 
an amino acid sequence must limit the number of different structures 
a given sequence can adopt and may indeed define its overall fold. If 
this is true, then the arrangement of hydrophobic and hydrophilic 
residues should be a characteristic feanire of a particular fold. Sweet 
and Eisenbcrg have shown that the correlation of the pattern of 
hydrophobicity between two protein sequences is a good criterion 
for their structural relatedness {38). In addition, several studies 
indicate that patterns of obligatory hydrophobic positions identified 
from aligned sequences are distinctive features of sequences that 
adopt die same structure {4, 29, 38, 39). Thus, the order of 
hydrophobic and hydrophilic residues in a sequence may actually be 
sufficient information to determine the basic folding pattern of a 
protein sequence. 

Although the pattern of sequence hydrophobicity may be a 
characteristic feature of a particular fold, it is not yet clear how such 
patterns could be used for prediction of structure de novo. It is 
important to understand how patterns in sequence space can be 
related to structures in conformation space. Lau and Dill have 
approached this problem by studying the properties of simple 
sequences composed only of H (hydrophobic) and P (polar) groups 
on two-dimensional lattices (40). An example of such a representa- 
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tion is shown in Fig. 5. Residues adjacent in the sequence must 
occupy adjacent squares on die lattice, and two residues cannor 
occupy the same space. Free energies of particular conformations arc 
evaluated with a smgle term, an attraction of H groups, B\- 
considering chains of ten residues, an exhaustive conformational 
search for all 1024 possible sequences of H and P residues was 
possible. For longer sequences only a representative fraction of the 
allowed sequence or conformation space could be explored. The 
significant results were as follows: (i) m^t all sequences caji fold into 
a "native" structure and only a few sequences form a unique native 
structure; (ii) die probability diat a sequence will adopt a unique 
native stnicnire increases widi chain length; and (iii) die natiw 
states arc compact, contain a hydrophobic core surrounded by polar 
residues, and contain significant secondary structure. Although the 
gap between these two-dimensional simulations and three-dimen- 
sional stmcturcs is large, the use of simple rules and sequence 
representations yields results similar to those expected for real 
proteins. Three-dimensional lattice methods are also beginning to 
be developed and evaluated {41). 



Summary 

There is more information in a set of related sequences than in a 
single sequence, A iiuiuber of practical applications arise from an 
analysis of die tolerance of residue positions to change. First, such 
information permits the evaluation of a residue's importance to the 
ftmcrion and srahilit)' nf a protein. This abilin- to identif\' the 
essential cicmcncs of a protein sequence may improve our under- 
standing of the determinants of protein folding and stability as well 
as protein flmction. Second, patterns of tolerance to amino acid 
substitutions of varying hydrophilicity can help to identify' residues 
likely to be buried in a protein structure and those likely to occupv 



Fig. 4. Helical hydro- 
phobic mnmenr,'; calcu- 
lated by using (A) tho 
Antennapcdia homeodo- 
main sequence or (B) a 
set of 39 aligned homeo- 
domain sequences (J.i), 
The bars indicare the ex- 
tent of the helical rc- 

studics of rhc Aiucnii.i- 
pedi-,1 hnmcndom;iin 
(M). To determine hv- 
drophobic moments, 
residues were assigned 




f of dire 



grou 



HI (high hydrophobici- 
ty = Trp, lie, Phc, Leu, 
Met, Val, or Cys); H2 
(medium hydrophobic- 
ity - Tyr, Pro, Aia, Thr, 
His, Gly, or Scr); and H3 (low hydrophobicity = Gin, Asn, Glii, Asp, Lvs, 
or Arg), For the aligned homeodomain sequences, the residues at each 
position were sorted by their hydrophobicity by using the scale of Fauchere 
and Pliska {4S). Arg and Lys were not counted unless no other residue was 
found at the position, because they contain long aliphatic side chains and tan 
thereby substitute for nonpolar residues at some buried sites. To account tor 
possible sequence errors and rare exceptions, the most hydrophilic residue 
allowed at each position was discarded unless it was observed twice. The 
second most hydrophilic residue was then chosen to represent the hydropho- 
bicit)' of each position. An eight-residue window was used and the vectors 
projected radially every 100°, The vector magnitudes were assigned a value of 
1, 0, or - I for positions where the hydrophobicit)' group was HI, H2, or 
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Fig. 5. A representation of one com- 
pact conformation for a particular 
sequence of H and P residues on a 
two-dimensional square lattice. 
[Adapted from (40), widi permis- 
sion of die American Chemical Soci- 
ety] 
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Transforming Growth Factor a: Mutation of Aspartic Acid 47 and 
Leucine 48 Results in Different Biological Activities 
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To study the relationship between the primary structure of transforming growth factor a (TGF-a) and some 
of its functional properties (competition with epidermal growth factor (EGF) for binding to the EGF receptor 
and induction of anchorage-independent growth), we introduced single amino acid mutations mto the sequence 
for the fully processed, 50-amino-acid human TGF-a. The wild-type and mutant proteins were expressed m a 
vector by using a yeast a mating pheromone promoter. Mutations of two amino acids that are conserved in the 
family of the EGF-like peptides and are located in the carboxy-terminal part of TGF-a resulted in different 
biological effects. When aspartic acid 47 was mutated to alanine or asparagine, biological activity was retained; 
in contrast, substitutions of this residue with serine or glutamic acid generated mutants with reduced binding 
and colony-forming capacities. When leucine 48 was mutated to alanine, a complete loss of binding and 
colony-forming abilities resulted; mutation of leucine 48 to isoleucine or methionine resulted in very low 
activities. Our data suggest that these two adjacent conserved amino adds in positions 47 and 48 play different 
roles in defining the structure and/or biological activity of TGF-a and that the carboxy termmus of TGF-a is 
involved in interactions with cellular TGF-a receptors. The side chain of leucine 48 appears to be crucial either 
indirectly in determining the biologically active conformation of TGF-a or directly in the molecular recognition 
of TGF-a by its receptor. 



Transforming growth factor a (TGF-a) is a polypeptide of 
50 amino acids. First isolated from a retrovirus-transformed 
mouse cell line (9), it has subsequently been found in human 
tumor cells (10, 29), in the early rat embryo (18), and 
recently in cell cultures from the pituitary gland (23). TGF-a 
appears to be closely related to epidermal growth factor 
(EGF) structurally and functionally (19, 20). The two pep- 
tides apparently bind to the same receptor, and both induce 
anchorage-independent growth of certain nontransformed 
cells, such as NRK cells, in the presence of TGF-p (1). 

Comparison of amino acid sequences reveals about 35% 
homology among the EGF-like peptides (rat [27], mouse 
[25], and human [13] EGFs and rat [19] and human [12] 
TGF-as). Some viral peptides (Shope fibroma growth factor 
[6], vaccinia growth factor [2], and myxoma growth factor 
[30]) also share homologies with the EGF-like peptides. 

If TGF-a is involved in transformation, a TGF-a antago- 
nist could be an important therapeutic tool in the treatment 
of certain types of malignancies. An understanding of the 
conformational and dynamic properties of the TGF-a mole- 
cule is basic to the design of an antagonist. A hypothetical 
antagonist would bind to the same receptor as TGF-a, but 
would not induce the series of proliferative and transforming 
events induced by TGF-a. To obtain such a molecule it is 
necessary to dissociate interactions responsible for binding 
from those involved in signal transduction. We decided to 
approach the problem by way of site-directed mutagenesis of 
a human sequence of TGF-a. In this report we describe our 
first series of mutations, which were carried out at residues 
Asp-47 and Leu-48, in the carboxy-terminal part of TGF-a; 
these two amino acids are highly conserved in the EGF-like 
family of peptides. We show that these two adjacent residues 
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play different roles in the structure and/or function of 
TGF-a. 

MATERIALS AND METHODS 
Cells. Normal rat kidney (NRK) cells were grown in 
Dulbecco modified Eagle medium containing 10% (vol/vol) 
calf serum. 

TGF-a gene. The sequence of the 50-amino-acid human 
TGF-a was originally derived from a human TGF-a precur- 
sor cDNA (12). The coding sequence is preceded by an ATG 
methionine codon and followed by a TAA stop codon and is 
flanked by EcoRl restriction sites. This £coRI fragment 
combines the 59-base-pair EcoRl-Ncol fragment from plas- 
mid pTE5 (12) with the Ill-base-pair Ncol-EcoRl fragment 
from plasmid pyTE2 (11). The resulting EcoRl fragment was 
inserted in M13mpl8 for site-directed mutagenesis. 

Synthesis and purification of oligonucleotides and oligonu- 
cleotide-directed mutagenesis. The synthesis and purification 
of 20- to 27-nucleotide oligonucleotides were carried out as 
described previously (31). The one or two nucleotides re- 
sponsible for the mutation were located in the middle of the 
oligonucleotide. Mutagenesis was performed by published 
procedures (21, 33). The sequences of the mutant clones 
were verified by the method of Sanger et al. (25). 

Yeast shuttle vector. The vector YEp70aT contains a yeast 
a-factor pheromone promoter and prepro sequence for the 
expression of TGF-a (15). The mutant TGF-a coding se- 
quence was inserted in the EcoRl site of plasmid YEp70aT 
and expressed in the form of a fusion protein consisting of 92 
amino acids from the prepro sequence of the yeast a factor 
attached to the amino terminus of TGF-a (28). The yeast 
cleaves the precursor and secretes TGF-a with 8 amino acids 
fused to it (4 are encoded by the prepro sequence of a-factor, 
and the other 4 are encoded by the DNA sequence added to 
insert of the TGF-a gene). The last of these residues is a 
methionine, which allows the cleavage of the secreted fusion 
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protein by cyanogen bromide (CNBr) and the release of a 
mature TGF-a (50 amino acids) (see Results). 

Yeast strain and transformation. The yeast Saccha- 
romyces cerevisiae 20B-12 {MATa. trpl pep4-3) (17) was 
obtained from the Yeast Genetics Stock Center, Berkeley, 
Calif. S. cerevisiae 20B-12 was grown in YEPD medium (1% 
yeast extract [Difco Laboratories], 2% Bacto-Peptone 
[Difco], 2% glucose). When the culture reached an optical 
density at 660 nm of 1, spheroplasts were prepared (14) for 
transformation. For each transformation we used 10 to 15 jxg 
of purified plasmid DNA. 

Partial purification of TGF-a mutants. At 3 days after 
transformation, five individual colonies of transformants 
were grown to saturation in YEPD medium. The amount of 
protein in the yeast medium was measured by the method of 
Bradford (3), and the amount of mutant TGF-a secreted in 
the yeast medium was determined by radioimmunoassay. 
The clones which secrete the highest amount of mutant 
TGF-a were used to grow a l-liter culture in YNB-CAA 
medium (0.67% yeast nitrogen base, 20 g of glucose per liter, 
10 g of Casamino Acids [Difco] per liter). After the culture 
reached saturation (optical density at 660 nm of 10 to 12) (48 
h in an air shaker at 30°C), the yeast conditioned medium 
was dialyzed extensively against 1 M acetic acid in 3,000- 
molecular-weight cutoff dialysis tubing. Usually 250 ml of 
dialyzed culture was lyophilized, suspended in 10 ml of 70% 
formic acid, and treated with CNBr (molar excess of 500) for 
20 h at room temperature. The CNBr was subsequently 
evaporated, and the samples were lyophilized. CNBr-treated 
samples were suspended in 1 ml of 1 M acetic acid, loaded on 
a Bio-gel P30 column (30 by 1.5 cm [Bio-Rad Laboratories]), 
and eluted with 1 M acetic acid. Fractions of 1 ml were 
collected. Aliquots were lyophilized, suspended in binding 
buffer (minimum essential medium containing 1 mg of bovine 
serum albumin per ml and 25 mM HEPES [A'-2-hydroxy- 
ethylpiperazine-A''-2-ethanesulfonic acid; pH 7.4]), neutral- 
ized if necessary to pH 7.4, and tested in EGF-binding 
competition and soft-agar assays, as well in radioimmunoas- 
say. 

Radioimmunoassays. The amounts of TGF-a secreted in 
the yeast medium were determined by radioimmunoassay 
with the immunoglobulin G fraction of a polyclonal anti- 
body, 34D, raised against recombinant human TGF-a (4), in 
0.1 M Tris (pH 7.5)-0.15 M NaCl-2.5 mg of bovine serum 
albumin per ml. The amounts of partially purified TGF-a 
present in the P30 column fractions were measured by using 
the Biotope RIA kit with polyclonal antibody against human 
TGF-a (a gift from W. Hargreaves, Biotope), under dena- 
turing conditions, as recommended by the supplier. 

EOF binding competition assay and soft agar assay. Both 
EGF-binding competition and soft-agar assays have been 
described previously (1). 

RESULTS 

Rationale for mutations in the carboxyl terminus of TGF-a. 

Figure 1 shows the amino acid sequence of TGF-a in which 
the residues that are conserved among all the EGF-like 
peptides described thus far (EGF, TGF-a, and EGF-like 
viral proteins) are enclosed in bold circles. Among the 11 
conserved amino acids, there are 6 Cys and 2 Gly residues, 
which presumably play essential roles in determining the 
overall conformation of the molecule. We concentrated on 
the two conserved amino acids in the carboxyl terminus, 
Asp-47 and LeU-48. The Asp in position 47 is conserved 
among the EGFs and TGF-a (human or murine), but not 
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FIG. 1. Mutations in the carboxy terminus ofhuman TGF-a. The 
amino acids conserved in all the family of EGF-like growth factors 
(human and murine EGFs and TGFs, as well as the gene products of 
the vaccinia virus [vaccinia growth factor], the Shope fibroma virus 
[Shope fibroma growth factor], and the myxoma virus [myxoma 
growth factor]) are enclosed in bold circles. The mutations of ammo 
acids at positions 47 and 48 are indicated. Symbols: A, Ala; C, Cys; 
D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, He; K, Lys; L, Leu; M, 
Met; N, Asn; P, Pro; Q, Gin; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; 
Y, Tyr. 



among the EGF-like viral proteins (vaccinia growth factor, 
Shope fibroma growth factor, or myxoma growth factor), 
whereas Leu 48 is conserved among all the EGF-hke pep- 
tides so far described. In both mouse and human EGF, the 
two corresponding residues (Asp-46 and Leu-47) are located 
near the surface of the protein (8, 22, 22a). We designed a 
series of mutations in these two positions. 

Asp-47 has been mutated to Glu, Asn, Ser, and Ala. Glu 
was chosen because it has the same charge as and a larger 
size than Asp; Asn has a similar side-chain structure, but is 
uncharged; Ser is smaller but still polar; Ala is smaller and 
nonpolar. 

Leu 48 has been mutated to He and Met, which are both 
large, nonpolar residues like Leu, and to Ala, which is 
nonpolar but smaller. We introduced the chosen mutations 
by site-directed mutagenesis of the cloned human TGF-a 
gene, using synthetic oligonucleotides. 

Construction of the yeast a mating pheromone-human 
TGF-a plasmid. The TGF-a expression vector pyTEl (Fig. 
2) was constructed by using plasmid YEp70aT (15) which 
contains the l\i.m origin of replication and yeast IRPl gene 
for its replication and selective maintenance, respectively. 
YEp70aT also contains the yeast a-factor promoter, the 
a-factor prepro sequence coding for 89 amino acids, and the 
sequence fer 3 amino acids resulting from the introduction of 
Xba\ and EcoKl sites. The human mature TGF-a sequence 
(12) is contained in a 170-base-pair £coRI fragment which 
includes an ATG (Met) codon preceding the sequence of 
TGF-a and a TAA (stop) codon followed by 8 nucleotides. 
This TGF-a sequence was inserted in the unique EcoRl site 
of YEp70aT. Clones with the proper orientation were se- 
lected, and DNA was isolated for yeast transformation. 

Measurement of TGF-a secreted by S. cerevisiae. The 
amount of total proteins secreted into the yeast culture was 
10 ± 1 (JLg/ml for wild-type as well as mutant TGF-a as 
determined by the method of Bradford (3). Before further 
purification was attempted, we wanted to determine whether 
the mutated TGF-a proteins were being secreted by the 
yeast. The low pH of the yeast medium, as well as the acidic 
proteins secreted in the yeast culture, precluded biological 
assay of secreted mutants. Therefore, immunological meth- 
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FIG. 2. Structure of the S. cerevisiae 8.2-kilobase shuttle vector 
pyTEl. The secretion of the TGF-a gene is under the transcriptional 
control of the yeast ct-factor promoter and prepro sequence ). 
The yeast 2tJi,m origin of replication (e^) and the selective yeast 
TRP! gene (Esa ) are indicated. The TGF-a gene, preceded by an 
initiation (ATG) codon and followed by a stop (TAA) codon, is 
inserted in the EcoRl site. Details are given in Materials and 
Methods and in Results. 



ods were used. Wild-type and mutant TGF-a's were se- 
creted at a level of 100 to 200 ng/ml and 10 to 500 ng/ml, 
respectively (as determined by radioimmunoassay with poly- 
clonal antibody 34D). We thus estimate that the percentage 
of TGF-a secreted in the yeast culture is at least 1% of the 
total protein secreted. We cannot yet assess whether the 
variations in the levels of secretion of different mutant 
TGF-a proteins are real or whether one single-amino-acid 
substitution drastically affects the recognition by the anti- 
body. The latter hypothesis is the more Ukely, since the use 
of another polyclonal antibody (Biotope) under denaturing 
conditions enabled us to detect certain TGF-a mutants (such 
as [Ala 47]-TGF-a, in which the amino acid in position 47 of 
human TGF-a is mutated to an alanine) that were poorly 
detected by 34D, under nondenaturing as well as denaturmg 
conditions. After the amount of TGF-a mutant proteins was 
estimated, the medium was extensively dialyzed against 1 M 
acetic acid and lyophilized as described in Materials and 
Methods. , , , , 

Partial purification of yeast-secreted TGF-od. Although the 
yeast shuttle vector was constructed in such a way as to 
secrete TGF-a with 8 amino acids fused to the N terminus, 
it was often observed that a significant fraction of the 
secreted TGF-a was in a higher-molecular-weight fragment 
corresponding to the size expected from an uncleaved (un- 
processed) 92-amino-acid fusion protein. Since a Met had 
been introduced at the N terminus of TGF-a and since 
TGF-a contains no Met in its sequence, CNBr treatment 
could be used to cleave either of these 8- or 92-amino-acid 
N-terminal peptides and release the complete 50-amino-acid 
TGF-a. Indeed, CNBr treatment of yeast-secreted proteins 
resulted in the conversion of high-molecular-weight TGF-a 
into the 6,000-molecular-weight species, as revealed by 
Western immunoblot (data not shown). 

CNBr-cleaved samples (see Materials and Methods) were 
purified on a Bio-Gel P30 column. Figure 3 shows the elution 
profile of the proteins, as well as the results of a radiorecep- 
tor assay and a soft-agar assay performed on aliquots of the 
column fractions. The A^go profile shows two major peaks of 
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FIG 3. Purification of yeast-secreted wild-type TGF-a. The pu- 
rification procedure is described in Materials and Methods and m 
Results Aliquots of every other fraction of the Bio-Gel P30 column 
were tested for their abilities to compete with '"I-BGF for binding 
to the EOF receptor (A) and to induce colony formation (>62 |xm) 
on NRK cells in soft agar in the presence of TGF-p (1 ng/ml) (•). 
The profile of the proteins was determined ( ). 



eluted proteins, one corresponding to the void volume and 
the other one to proteins of molecular weight <3,000. 
Aliquots of the column fractions were tested for their abUity 
to compete with ^^^I-EGF for binding to the receptor. The 
fractions that were the most active in this assay were located 
between the two major protein peaks, in an area where 
relatively few proteins eluted. Although some activity was 
found in the first protein peak (void volume), this was 
considerably reduced on treatment with stronger CNBr (data 
not shown). ^ , . .... 

Aliquots of each fraction were also tested for their ability 
to induce anchorage-independent growth of NRK cells m 
soft agar in the presence of TGF-p (1 ng/ml). The receptor 
binding and colony-forming activity superimposed almost 
exactly (Fig. 3). Analysis by polyacrylamide gel electropho- 
resis with silver staining, as well as by Western blot, of the 
column fractions shows that our purification procedure 
(CNBr cleavage followed by P30 sizing column) ehminates 
high-molecular- weight proteins (data not shown). Since pure 
TGF-a migrates in a broad band on sodium dodecyl sulfate- 
polyacrylamide gel electrophoresis (32), this technique can- 
not be used for proper assessment of the degree of separa- 
tion of TGF-a .from low-molecular-weight contaminating 
proteins. Nevertheless, within our detection levels the 
amounts of TGF-a present in the column fractions (detected 
by radioimmunoassay using the antibody from Biotope) 
correlated with the amounts observed on sodium dodecyl 
sulfate-polyacrylamide gel electrophoresis (data not shown). 

Comparison of binding and colony-forming activity of 
TGF-« partially purified from yeast media. It was important 
to show that wild-type TGF-a secreted from S. cerevisiae 
had the expected biological properties and that its activity in 
soft-agar and radioreceptor assays was equivalent. For these 
assays, the amount of EGF-competing activity present m the 
most active fraction of the P30 column of wild-type TGF-a 
was measured in terms of EGF equivalents. The dilution 
curve had a slope that was parallel to that of the EGF 
standard. This value was also used to measure the colony- 
forming activity of the partially purified wild-type TGF-a 
(with EGF as a standard in the assay). The colony -forming 
activity of the partially purified wild-type TGF-a corre- 
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TABLE 1. Biological and biochemical activities of mutant TGF-a 
proteins secreted by 5. cerevisiae and partially purified 

livalence (ng/ml) in: Amt of TGF-a 

,eptor Soft agar radioimmunoassay 



Insert in the yeast 



EGF EQUIVALENT Ing/ml) 
FIG. 4. Correlation between the activities in the binding and 
colony-forming assay for the partially purified wild-type TGF-a 
secreted by S. cerevisiae. The activity in the radioreceptor assay of 
the peak fraction from the P30 column was determined in EGF 
equivalent concentration. The value obtained was used for the 
soft-agar assay. Colonies of >62 ^tm (A) and the EGF standard (•) 
are shown. 



sponded exactly to that of EGF (Fig. 4). Thus, we have 
partially purified a wild-type 50-amino-acid TGF-a showing 
the expected binding and colony-forming activities, which 
provides a reference substance for mutant TGF-as that 
might show a dissociation of binding and colony-forming 
abilities. 

Biological and biochemical activities of the partially purified 
TGF-a mutant proteins. Mutated TGF-as were expressed by 
using the yeast system and partially purified on Bio-Gel P30 
columns as described in Materials and Methods. Mutant 
TGF-as were usually obtained from two different clones of 
yeast transformants. The CNBr-cleaved samples were puri- 
fied through different Bio-Gel P30 columns for each mutant 
protein to avoid any possible contamination from one pep- 
tide to another. The purification profiles observed with the 
mutant TGF-as were similar to those obtained for the 
wild-type TGF-a. Aliquots of the P30 column fractions were 
tested in radioreceptor and soft-agar assays. For all mutant 
proteins, the highest activity in both assays was always 
found in the same fraction of the Bio-Gel P30 column effluent 
(peak fraction). Extensive purification of a series of mutant 
proteins for screening purposes is not practical. Therefore, 
we needed a quantitation system that would allow us to 
compare mutant proteins with each other. Thus, the amount 
of TGF-a present in the peak fraction was estimated by 
radioimmunoassay with an antiserum to native TGF-a (ob- 
tained from W. Hargreaves), under denaturing conditions, as 
described in Materials and Methods. All values given in 
Table 1 were obtained from the peak fraction. 

The controls done with the wild-type TGF-a showed (Fig. 
4; Table 1) that binding and transforming activity were 
equivalent. The yeast vector without a TGF-a insert did not 
secrete any EGF-like proteins, as determined by both radio- 
receptor and soft-agar assay. 

Two types of results were obtained upon assay of mutant 
proteins having different amino acid substitutions at Asp-47. 
In both [Ala-47]-TGF-a and [Asn-47]-TGF-a, binding abil- 
ity was retained. Soft-agar and radioreceptor activities cor- 
related for [Asn-47]-TGF-a; there was a lower value for 
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colony-forming activity than for EGF-binding competition 
for [Ala-47]-TGF-a. [Ser-47]-TGF-a and [Glu-47]-TGF-a 
appeared to have lower activities in both assays than either 
wild-type TGF-a or [Ala-47]-TGF-a and [Asn-47]-TGF-a. 
These results indicate that neither the carboxyi charge nor 
the polarity of Asp-47 is essential for biological activity. 

The effects of mutation of Leu-48, one of the 11 amino 
acids perfectly conserved among all the EGFs, TGF-as, and 
viral EGF-like proteins, are dramatic. [Ala-48]-TGF-a to- 
tally lacked binding and colony-forming activity. [Ile- 
48]-TGF-a and [Met-48]-TGF-a had very little biological 
activity compared with wild-type TGF-a. Another substitu- 
tion, [Met-48]-TGF-a, resulted in a truncated mutant lacking 
the last 2 amino acids and having a substitution of Leu to 
homoserine at position 48 following treatment with CNBr. 
Alternatively, if [Met-48]-TGF-a was not treated with 
CNBr, fusion proteins of TGF-a (mutated to Met in position 
48) with 8 or 92 amino acids attached at the N terminus were 
obtained. Very low activities in binding and soft-agar assays 
were found for these mutants, whether or not they were 
cleaved with CNBr. Experiments on EGF and TGF-a have 
shown that an N-terminal extension does not markedly 
modify EGF-binding activity (12, 26). Therefore, the loss of 
activity obtained with [Met-48]-TGF-a that has not been 
CNBr treated was probably due to the mutation itself and 
not to the N-terminally extended fusion protein. We do not 
know whether the loss of activity observed with the TGF-a 
shortened to 48 amino acids and having a substitution of 
Leu-48 to homoserine is due only to the mutation or also to 
the lack of the last 2 amino acids. 

The data obtained by radioimmunoassay on the partially 
purified wild-type and mutant TGF-a show that the amount 
of TGF-a detected was always higher than the amount 
determined by measurement of biological activity. This may 
be due to the presence in the fraction of a certain percentage 
of incorrectly folded TGF-a that might be recognized in a 
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radioimmunoassay under denaturing conditions but would 
not be biologically active. None of the mutant proteins 
seemed to be present in amounts equivalent to those ob- 
served for wild-type TGF-a in the partially purified fractions 
(whether radioimmunoassay, radioreceptor, or soft-agar as- 
say was used for quantitation). It is not clear whether 
consistently less TGF-a was produced by the mutant con- 
structs than by the wild type or whether the secreted mutant 
proteins were simply less well recognized by the antibody. 
Because of these uncertainties, the biological activities ot 
the different mutant proteins cannot be accurately related to 
a known amount of mutant TGF-a protein. Even though 
radioimmunoassay should be used with caution for a quan- 
titative evaluation of mutant TGF-a proteins, a positive 
reaction demonstrates that immunoreactive TGF-a was 
present in the P30 peak fraction for each mutant. Therefore, 
the fact that one of the mutant proteins ([Ala-48]-TGF-a) is 
biologically inactive can be attributed to the mutation itself, 
and not to the lack of production of the mutant protein by the 
yeast or its loss through purification. However, if the mutant 
proteins are in fact as immunoreactive as the wild type, then 
[Ala-47]-TGF-a and [Asn-47]-TGF-a are as active as wild- 
type TGF-a and [Glu-47]-TGF-a and [Ser-47]-TGF-a are 
less active; in contrast, [Ile-48]-TGF-a and [Met-48]-TGF-a 
are almost inactive. The differences between mutation ot 
Asp-47 and Leu-48 would then be even more striking. 

DISCUSSION 

TGF-a shows sequence homologies with EOF, and both 
growth factors share the same cellular receptors (20). Even 
though EGF was discovered 25 years ago (7) and its prop- 
erties have been extensively studied over the years (5) the 
binding site of EGF to its receptor has still not been 
determined, and the relationship between structure and 
function of EGF/TGF-a is still to be discovered. Particu- 
larly we do not know whether binding to the receptor and 
signal transduction occur through one or more domains ot 
the molecule or through which amino acids. We approached 
the question by performing site-directed mutagenesis of 
TGF-a and focused our attention on two adjacent ammo 
adds, Asp-47 and Leu-48, located in the carboxy terminus 
and highly conserved in the EGF-like family of peptides. 
Unexpectedly, these two amino acids showed very different 
sensitivities to mutation and particularly to a substitution to 
Ala- [Ala-47]-TGF-a retained binding and colony-torming 
activities, whereas [Ala-48]-TGF-a completely lost both 
activities. These data show that Asp-47 and Leu-48 play very 
different roles in defining the structure and/or the activity ot 
TGF-a. The other mutations performed on Asp-47 were 
substitutions to Asn, Ser, and Glu. [Asn-47]-TGF-a, like 
[Ala-47]-TGF-a was active in binding and induction ot 
colony formation, but [Ser-47]-TGF-a and [Glu-47]-TGF-a 
showed weaker growth factor activities. These results indi- 
cate that neither the carboxyl charge nor the polarity ot 
Asp-47 is essential for biological activity. Interestingly, two 
of the EGF-like viral proteins, myxoma growth factor and 
Shope fibroma growth factor (6, 30), have Asn instead of Asp 
in position 47; we have shown that [Asn-47]-TGF-a retains 
biological activity. , „ , . . * » 

Substitution of Leu-48 to Met and He led to mutant 
proteins with very low activities, whereas substitution to Ala 
led to complete loss of activity. We did not expect that a 
mutation of Leu to He (which have similar sizes and polari- 
ties) would cause such a strong effect. Thus, Leu-48, which 
is conserved perfectly among all the EGF-like peptides. 
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seems to be essential, through its exact geometry, for the 
biological activity of TGF-a. 

The mutant proteins tested so far, when active, showed 
parallel behaviors in binding and colony formation. Some 
mutant proteins lost all activities, and we assume that the 
binding capacity has been lost. We have not been able to 
dissociate the binding and colony-forming abilities by using 
any of the present series of mutant proteins, and it is 
necessary to screen more of them in search of an antagomst 

°^Re^ult" relating to the biological activity of EGF show that 
derivatives of mouse EGF and human EGF (EGF 1-47) 
lacking the carboxy-terminal 6 amino acids as a result ot 
enzymatic digestion are less potent than the intact molecule 
in mitogenic stimulation of fibroblasts, but retain full biolog- 
ical activity in in vivo assays (inhibition of gastnc acid 
secretion) (16). On the other hand, naturally occumng 
truncated forms of rat EGF, which lack the carboxy-ternainal 
5 amino acids (rEGF 2-48) are as potent as mouse EGF 
(mEGF 1-53) in receptor-binding and mitogenic assays {21). 
We do not know whether the discrepancies observed are due 
to the origin of the molecule (artificial or natural) or to the 
type of bioassay used. In any event, all of these EGF-related 
molecules, which are shorter than mouse or human EOF, 
still retain Leu-47. We have shown that m JGF-a, the 
corresponding residue, Leu-48, is critical for the biological 

""^RSent data on the three-dimensional structure of mouse 
EGF obtained by nuclear magnetic resonance show that 
even though Asp-46 and Leu-47 (Asp-47 and Leu-48 m 
TGF-a) are both solvent accessible (8, 22, 22a), their side 
chains point in opposite directions in the beta-sheet struc- 
ture Therefore, the role of these adjacent ammo acids in the 
structure and, consequently, the function of EGF might be 
very different. Our data show that the amino acids Asp-4/ 
and Leu-48 of TGF-a are not equally important for the 
biological activity of TGF-a, despite their conservation 
among the EGF-like peptides. From the dramatic loss in 
biological activity which is characteristic of mutation ot 
Leu-48 we also suggest that this residue is involved m 
binding to the cellular receptors either by direct interaction 
with the receptor or by providing the proper conformation to 
the molecule. 
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Catalytic Plasticity of Fatty 
Acid Modification Enzymes 
Underlying Chemical Diversity 
of Plant Lipids 

Pierre Broun,' John Shanklin,*+ Ed Whittle, Chris Somerville+ 

Higher plants exhibit extensive diversity in the composition of seed storage 
fatty acids. This is largely due to the presence of various combinations of double 
or triple bonds and hydroxy! or epoxy groups, which are synthesiied by a family 
of structurally similar enzymes. As few as four amino acid substitutions can 
convert an oleate 12-desaturase to a hydroxylase and as few as six result in 
conversion of a hydroxylase to a desaturase. These resulu illustrate how 
catalytic plasticity of these diiron enzymes has contributed to the evolution of 
the chemical diversity found In higher plants. 



.^11 iiiaher plants contain one or more oleaie 
desaturases that catalyze the 0,-dependent in- 
senion of a double bond berv^een carbons 12 
and 13 of lipid-linked oleic acid (18:1'") lo 
produce Imoleic acid (IS (/). In con- 

trast. onl_\- 14 species in 10 plant families have 
been found to accumulate the structurally relat- 
ed hydrox\' fatrs- acid, ncinoleic acid (D-12- 
h\'drox\octadec-ciJ-0-enoic acid) (.'). which is 
SNTithesized b\ an oleate hydroxylase that ex- 
hibits a high degree of sequence similarity to 
oleate desaturases (J). The oleate desaturases 
and hydroxylases are integral membrane pro- 
teins. « hich are members of a large family of 
funciionally diverse enz>-mes that includes al- 
kane hydroxylase, xylene monooxygenase. car- 
otene ketolase. and sterol methyloxidase (/). 
These nonheme iron-containing enzvTncs use a 
diiron cluster for catalysis (^1 and contain three 
equivalent histidine clusters that have been im- 
plicated m iron binding and shown to be essen- 
tial for catalysis (/). This class of proteins ex- 
hibits no significant sequence identity to the 
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soluble diiron-comaining ei:izymes which rep- 
resent a similar diversity of enzymatic activities 
that include plant acyl-ACP desaturases. meth- 
ane monooxygenase, propene monooxygenase. 
and the R2 component of ribonucleotide reduc- 
tase (/, J). The catalytic activities of these 
enzymes his been mimicked by a synthetic 
diiron-containing complex with a coordination 
sphere composed entirely of nitrogen ligands 
(tf). 

The oleate hydroxylase lirom the ctucifer 
Ltsquerella fendleri has about 81% sequence 
identity to the oleate desamrase from the 
cruclfer /<ra6u&/wif ihaliana and about 71% 
sequence identity to tfw oleate hydroxylase 
from Ricima communis (7). The observation 
that these cnicifer desaturase and hydroxy- 
lase enzymes are more similar than the two 
hydroxylases, and the presence of ncinoleic 
acid in t small number of distantly related 
plartt species, suggests that the capacity to 
synthesize ricinoleate has arisen mdependenl- 
ly several times during the evolution of high- 
er plants, by the genetic conversion of desani- 
rases to hydroxylases. 

Compiisoo of the amino acid sequences of 
the hydroxylase* ftom L fendleri and R. com- 
munis widi the sequences for oleate desanirases 
from ArabUhpsis, Zea mays. Glycine max (rwo 
sequences), R. communis, and Brassica napus 
revealed that only seven residues were strictly 



conscr^•ed m all of the six desaturases but di- 
vergent in both of the hydroxylases. The role ot" 
these seven residues was assessed b> using 
site-directed mutagenesis to replace the resi- 
dues found in the Lesquerellu hvdroxyiase. 
LFAHI2. with those from the equnaleni posi- 
tions m the desacurases lA'. V'l. In a reciprocal 
experiment, we replaced the seven residues in 
the Arabidopsis FAD2 oleate desamrase with 
the corresponding Lesquerella hydroxylase res- 
idues ilO). The acn\it>' of the modified and 
unmodified genes was then determined by ex- 
pressing them in yeast and transgenic planu. 
before analv'zing the composition of the total 
fatt>' acids. Techmcal difficulties limited the 
utility of direct measurement of enzvme actn ■ 
it>' in cell extracts (//). 

The mutant hydroxylase and desamrase 
genes contauiing all seven substimtions idcsig- 
nated m,LFAHl2 and m-F.\D:. respecti\elyi 
were expressed in yeast cells under transcnp- 
nonal control of the O.A.LI promoter Trans- 
genic cells were har\'csled after induction and 
their total fait\- acid composition determined b> 
gas chromatography. Wild-t>pe yeast cells do 
not accumulate detectable concentrations of di- 
unsaturated or hydroxylated fair, acids i/."! 
Expression of F.\D2 caused the at 
of about 4"'. o diunsamrated fatn aj;ic 




Fi(. 1. Fatty acid composition of yeast cells 
expressing desaturase and hydroxylase genes. 
Cultures were induced in growth medium 
containing galactose, -2 x 10* cells were 
harvested, and fatty acids were extracted and 
modified for analysis by gas chromatography. 
IS described (7). Values are the averages 
(iSE) obtained from five cultures of inde- 
pendent transformants. 
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lS,:i bui no delectable hydroxy fatT>' acids 
(Fic^ I). Expression of LFAHli caused the 
accumulation of about \ A"o diunsaiuraicd fatty 
acids and 15% ncmoleic acid, connmiing the 
mixed function of this enzyme i "1. Cells ex- 
pressing m-FAO; accumulated ncinoleic acid 
to -0.5° 0 of total fatn acids and had -5fl^o 
reduction m the accumulation of diunsat\milcd 
fatty acids iFig. I l Thtis. replacement of the 
sev en residues ( 10) convened a stnc; dcsamrase 
!o a bihmctional dcsaturase-nydro\> lase com- 
parable in activity to the unmodified 



The 



ic LFAHi: a 



s. possibly because linoleic acid is 
more stable than ncinoleic acid in yeast cells 
In cells expressing m-FAHi:. the ratio of 
18:2 diunsaturated fatt>- acid to ricinoleic 
acid was. on average. 43 times that in cells 
expressing LFAHI There 



fold 1 



n the r 



of 16;: 



uraied fatty acid to ncinoleic acid. NorvMth- 
standing the quantitative limitations of the 

dicate a' major increase in desaturase aciivitx- 
and a decrease in hydro.xylase actu ity upon 
introduction of the seven desaturase-equiva- 
Icnt residues into LFAHi: 

The activity of the mutiL^l enzymes m planta 
was examined by using the corresponding genes 
to produce stable transgenic plants ir, an Arabi- 
dop.us lad: mutant, w hich is deficient in oleate 
desanirase activity I ;.'). E.xpression of LF.AHU 
under transcnptiona! control of the constimti\'e 
caulinower mosaic mhjs (CaSn'! 35S promoter 
resulted in accumulation of high concentrations 
of hydroxy fatty acids in seeds ( 'l. but no de- 
tectable suppression of the faJ: mutant pheno- 
type in leaves I Fig :i. In contrast, expression of 
m,LF.AJ412 under the same circumstances re- 
sulted in complete suppression of the fad: phe- 



notype in 8 out of 10 transgcmc plants analyzed 
(Fig. 2). There was an average 21-fold increase 
in the ratio of linoleaie to oleaie in leaf fatty acids 
and a small increase in the amount of linolemc 
acid. These results, which are consistent with the 
results of the yeast assays, confirm that expres- 
sion of m,LFAH12 in plants deficient in oleate 
desaruration has identical phenotypic conse- 
quences to e.xpressing a wild-type dcsamruse 
such as F.\D2 {13). 

To evaluate the effect of the seven muta- 
tions on the activity of the gene encoding 
F.AJD2. we expressed FAD2 and ia,FAD2 in 
the Arabidopsis fad2 mutant under the control 
of the strong seed-specific promoter from the B. 
rapa napin gene. As expected from previous 
studies (71, none of the 15 transgenic lines 
expressing the FAD2 gene accumulated detect- 
able hydroxy fatty acids, although the ratio of 
linoleate to oleate accumulation was increased 
an average of lO-foW as compared with un- 
tiansformed controU. In the transgenic lines 
expressing m,FAD2, the amount of hydroxy- 
laied fatty acids, which included licinoleic, den- 
sipolic. and lesquertjllc acids, composed up to 
9.4% of total seed fatty acids (Fig. 3). The ratio 
of seed linoleate to oleate contents was in- 
creased an average of 6.4-fold (/A which in- 
dicated that nv,FAD2 exhibited significant de- 
saturase activity, albeit less than the wild-type 
F.\D2 gene. The high concentrations of hy- 
droxy fatty acid accumulation observed in 
transgenic plants expressing nv,FAD2 indicated 
that the modified desaturase had comparable 
levels of hydroxylase actiN-ity, in the in planta 
assay, to the native Lesqueretla hydroxylase 

To deterroine whether any single ; 
residue of the seven had a major effect 
ratio of hvdroxylase to desanuase activit 
mtroduced each of the seven FAD2-equivalent 
residues (5) individuilly into the UAH 12 en- 




zyme. None of the enzymes conumuig single 
amino acid substinjQons had activities that dif- 
fered significanliy from the w ild-type hydrox- 
ylase enzyme when expressed in yeast ( /-/i We 
also tested seven modified LFAH 1 2 genes con- 
taining all combinations of sl\ desamrase- 
equivalent residues I Fig. 4 i. Each of the seven 
constructs produced a ratio of diunsaturated to 
hydroxylated fatty acids that v^'as similar to ihe 
ratio produced by the m,F,AH 1 2 enzyme. Thus, 
as few as six residues principally determine the 
ratio of desaturation or hydroxylanon activity 
,MI lines showed somewhat reduced levels of 
desamrase acnvity. with the largest reducnons 
of -^''i seen in F218Y and GI05A. There- 
fore, we made a construct in vvhich both these 
changes were combuied (xF218V GI05A|. 
This construct exhibited similar acnvity lo the 
individual F218Y and O105A mutants [14). 
suggesting dial their effects are redundant and 
that the bbs^^^'ed changes in activ ity result from 
interactions of mote than two of the seven 
lesidues. Considered together, these results in- 
dicate that no smgic amino acid position play s 
an essential role in cataKlic outcome. Radier. 
changes in activnty result from a combined ef- 
fect of several amino acid positions that have 
pamally overlapping effects. 

Because four of the seven amino acids are 
adjacent to hisndine residues that hase been 
identified as essential to catalysis i / j. we hypoth- 
esized that these four residues may be of greatest 
importance to the outcome of the reaction. A 
modified FAD2 enzyme, designated m.F.MD:. 
was constructed in which these four amino acids 




Fig. 2. Genetic complementation of the Mabi- 
dopsis fadZ mutation with the m,LF AH12 gene^ 
Measurements were made of the fatty acid 
composition of leaf lipids from wild-type, the 
fadZ mutant, and transgenic fadZ plants ex- 
pressing Lf AH12 or m,Lf AH12. under the con- 
trol of the CaMV 35S promoter. Values are 
means = SE (" = 3). 



Fig. 3. fatty Kid content of seed lipids from 
independent tr»mg«olc Anbidopsis lines ex- 
pressing m JADZ or m<FAD2 under control of 
the fl. naput napln promoter. Abbreviations: 
ricinoleic kW (18:1-0H), d«wipoUc add 
(18:2-OH), and te*<ju«rolk »cid (20;1-OH). 



Flf. 4. Contribution of individual ammo acid sub- 
stitutions to the aaivity of the modified Usque- 
niii hydroxylase. Seven denvatives of the 
mXfAH12 gene containing all combinations of 
sixout of seven substitutions were introduced 
into yeast cells, and the fatty add composition of 
five independent cultures was measured. The X 
designation refen to the unmodified amino acid 
(that 1$. enzyme X1325M contains aU o< the seven 
substitutions except I325M). 
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were replaced by their equivalents from the 
Lesquerella hydroxylase (TI4SS. A296V. 
S32:a. M324H. Expression of mjFAD2 in seeds 
of wild-tvpe Arabia'opsis resulted in the accumu- 
lation of average concentrations of hydroxy fatt\ 
acids that uere similar lo those obtained with 
m-FAD2 iFig. 3). Thus, only four chances are 
required to conven a strict desamrase to an en- 

also an efficient hydroxylase. 

Biochemical and structural similarities be- 
tween the desamrase and hsdroxylase. m addi- 
non to recent kinetic isotope expenments, sug- 
gest that there is a common initial oxidation 
event at C-i: for both etuymes t/5i. Thus, it 
seems likely that the different functional out- 
comes represent a panitioning berween rwo 
reaction pathways that diverge after mllial C-12 
hydrogen abstraction such that one pathway 
fa\ors a second hydrogen absuaction whereas 
the other favors oxygen transfer We env ision 
that because no specific single amino acid 
change is required, and in view of the substan- 
tial etTeci of the four residues that abut the 
active site histidines. the differences berwecn 
desaturase and hydroxylase outcome is influ- 
enced by changes in active sue geometry. Ex- 
amples of such changes might include the rel- 
ative positioning of the substrate with respect to 
the iron center, the coordination geometry of 
the iron ions, or the active site hydrogen bond- 
ing nehvork. Whatever the case, this mode of 
evolving new catalync activity differs from the 
more general case in which the evolution of 
new activities involves the incorporation of new 
cataKiic croups into the active site (I6\. 

.^cetylenic and epoxy fatry acids are pro- 
duced by desaturation and epoxidation of dou- 
ble bonds by enzymes that are structurally sim- 
ilar to the enzymes descnbed here ( / Thus, 
variations of the same catalytic center can cat- 
alyze the formation of at least four different 
functional groups m latty acids. Because vari- 
ous combinations of these four functional 
groups define most of the chemical complexity- 
found among the hundreds of different fatty- 
acids that occur in higher plants (.'). it is now- 
apparent that most of the chemical complexity 
of plant fatty acids can be accounted for bv 
divergence of a small number of desaturases. 
Extrapolating from the results described here, it 
also seems very likely that a small number of 
ammo acid substitutions will account for the 
functional divergence of desamrases, hydroxy- 
lases, expoxgena.ses. and acervlenic bond- 
forming enzymes. 
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The Function of myo-Inositol in the Biosynthesis of Raffinosc 

Purification and Clinractcrization 
of Galactinol :Sucroso 6-Galactosyltransfera8e from Vicia faba Seeds 

Ludwig Lbhle and Widmar Tanner 
Fnchbcroich Biologic dor UniverBitiit Rcgonsburg 
(Roooivod April 18/Jiino 28, 1073) 



1. An ciizymo I'roin Vicia faba seeds is described whicli transfers the galactosyl moiety of 
galactinol to sucrose giving rise to raffinosc and ?ri,7/o-ino8itol. 

2. The enzyme was purified about 400-fold through 0 steps, A molecular weight of 80000 has 
boon determined by gel-filtration and of 100000 by glycerol density gradient oontrifugation. 

3. The enzyme galactinol : sucrose G-galactosyl transferase is different from a-galactosidaso; 
these two activities as well as the stachyosc-aynthesizing enzyme separate during purification. 

4. The transferase showed a liigh acceptor specificity. Out of 10 acceptors tested a transfer 
only to sucrose took place. This transfer was 6 times faster than the hydrolysis of galactinol. 
Galactinol, p-nitrophenyl-a-D-galactopyranoside and raffmose, but not UDP-galactoso, could 
act as donors. 

6. The enzyme catalyzes an exchange reaction between raffinoso and [i^CJsucroso. This partial 
reaction is less sensitive towards heat inactivation and Sll-poisons than the total reaction. 

C. The pH-optimum of the reaction was found to bo pll 7.0, the temperature optimum 42 °0. 
Heat inactivation could be prevented to some extent by galactinol and raffinosc. In the presence 
of 0.4 mM sucrose the /■fm-value for galactinol was 7 mM and for raffinoso 10 mM. For sucrose a 
Zm-value of 1 mM in the synthesis reaction has been determined. 

7. The transferase activity is high enough to explain the synthesis rate iVi vivo of all the raf- 
finose-type sugars present in the seeds. 

8. The physiological meaning of the results as well as the metabolic function of wj/o-inositol 
is discussed. 



One of the major exceptions to Leloir's mecha- 
nism [1] of glycosidic linkage formation in nature 
has been discovered in the biosyntliesis of a group 
of plant oligosaccharides, the sugars of the raffinosc 
family [2,3]. Besides sucrose these sugars are the 
most common and widespread ones in higher plants 
and have a function as storage and transport 
piaterial [4— C]. Whereas evidence in vivo and in vitro 
[2,7—9] has firmly established that the biosynthesis 
of staohyoso and verbascose proceeds via a trans- 
glycosylation of the galactosyl-moiety from galac- 
tinol [L-l-(0-«-D-galactopyranosyl)-m?/o-inositol] to 
raffinosc and staohyose, respectively [Eqns (3) 



I. Gal-aONp, p-nitrophe,nyl-a-D-gaIactopyra- 
notinol, L-l-(0-«-D-galflotopyranoflyl)- 



Abbrevialiot 
nosido. 

Trivial Nav 
"lyo-inoBitol. 

Enzymes. o-GnlactoHidaflo or r<-n-galaoto«ido galaoto- 
hydrolaso (EC 3.2.1.22); galactinol : raffinoflo O-galactonyl- 
Iransferaae (EC 2.4.1.-); aldolaso or fructoso-l.fl-bisphoB- 
phate D-glyoeraldehydo-3-phosphato lyase (EC 4.1.2.13). 



and (4) below], conflicting evidence has been publish- 
ed eonoerning the biosynthesis of raffinosc, the 
smallest member of the homologous series of tho.se 
oligosacoh arides . 

On the one hand evidence for the reaction ee- 
quenco (1) and (2), analogous to stachyoso and 
verbascose synthesis, has been presented [10]. On 
the other hand a transfer of the galactosyl moiety 
from UDP-galactose to s 



UDP.j 
Galactinol + 



+ ??i?/o-inositol 

-> galactinol + UDP 



± raffinose + mj/o-inositol (2) 



Galactinol + raffinoso 

?± staohyose -f- mj/o-inositol (3) 

Galactinol + stachyose 

+±: verbascose -f mi/o-inositol (4) 



i^^^SiiiMiSilfiliEIl 



boon reported [1 : 



13]. However, in this ease tlie 
IS were fairly cnido and the 
possibility cannot be excluded that the sum of 
reaction (1) and (2) has been measured. Reaction (1) 
has been originally described by Frydman and Neu- 
feld [14]. 

In the report to follow a 400-fold purifieation of 
the galactinol: sueroso 0-galaetosyl transferase, the 
enzyme catalyzing reaction (2), from Ftcia faba 
seeds will be described. The enzyme also catalyzes 
an cxehangc reaction between raffinose and sucrose, 
■which is considerably more stable than the reaction 
responsible for net synthesis of raffinose. This 
latter observation explains the fact that Moreno 
an Cardini [15] have been able to observe only the 
exchange reaction in wheat germ extracts. 

MATERIALS AND METHODS 
Purification Procedure 

AH procedures were carried out at about 4 °C. 

Step 1. Preparation oj Crude Extract. 200 g ripe 
seeds from Vicia jaba were powdered in a Waring 
Blcndor and then extracted in a chilled mortar in 
two portions each with 200 ml of 0.1 M Tris-HGl 
buffer pH 7.3 containing dithioerythritol 5 mM. 
The homogcnate was centrifuged for 30 rain at 
27000Xf? giving a clear supernatant of about 
250 ml. 

Step 2. Trealinent with Protamine Sulfate. The 
supernatant was brought to a protein concentration 
of 60 ing/ml with the same buffer as used for stop 1. 
A 2''/o protamine sulfate solution was added to a 
final ratio of !) mg protamine sulfate per 100 mg 
protein. After 30 min of stirring, the resulting pre- 
cipitate was centrifuged off and discarded. 

Step 3. Avimonium Sulfate Fractionatii 
protamine-trcatcd supernatant saturated, cold am- 
monium sulfate solution, pH.7.3, was slowly added, 
with continuous stirring to give 33''/o saturation. 
After 30 min, the precipitate was separated by 
centrifugation and the supernatant was brought 
to SS^/o saturation. The pellet obtained after centri- 
fugation was dissolved in 70 ml 0.1 M Tris-HCl 
pH 7.3 containing 5 mM dithioerythritol and dialyzed 
overnight against 3 1 of 0.05 M Tris-HCl pH 7.5, 
containing 1 niM dithioerythritol. 

Step 4. Column Ghromalography on DEAE- 
Celluloae. The dialyzed enzyme solution was ad- 
sorbed on a DEAE-eclluloso column (2.6x30 em) 
which had been equilibrated with 0.01 M Tris-HCl 
pH 7.5 containing 0.05 M KCl and 1 mM dithio- 
erythritol. After the column was washed with 
equilibration buffer imtil all protein not bound was 
removed, 11 linear gradient of 0,05 M KCl to 
0.2 M KCl in 0.01 M Tris-HCl with 1 mM dithio- 
erythritol was used for elution. Fractions of 6 ml 
wore collected and those with the highest specific 



To the 



activity were pooled and concentrated to a small 
volume in an Aminco ultrafiltration cell with 
filter No XM-60, 

Step 5. Sephadex 0-200 Oel Ohromatography . The 
pooled and concentrated fractions were loaded onto 
a column (2,6x80 cm) of Sephadex G-200, equili- 
brated with 0.01 M Tris-HCl buffer pH 7.5 contain- 
ing 0.1 M KCl and 2 mM dithioerythritol. The 
column was eluted at a flow rate of 4ml/h; 2-ral 
fractions were collected and the active fractions 
( 1 00 — 1 20) were pooled and concentrated as described 
before. 

Step 6. Hydroxy apatite Chromatography. After 
dialysis against 0.01 M Tris-HCl with 2 mM dithio- 
erythritol pH 7.5, the enzyme solution was applied 
to a column (2.5x13 cm) of hydroxyapatite, which 
had been equilibrated with 0.01 M potassium 
phosphate buffer pH 7.5 containing 2 mM dithio- 
erythritol. Elution was carried out stepwise with 
100 ml potassium phosphate buffer of the following 
concentrations; (a) 0.01 M; (b) 0.05 M; (c) 0.1 M; 
(d) 0.2 M. The enzyme was eluted with 0.2 M buffer. 
The active fractions were again concentrated as 
described above. 

Tests for Eiizymic Activities 
Qalaclosyltransf erase: Synthesis and Exchange 
Reaction. Two tests have been used, to measure the 
transfer of the galactosyl moiety from galactinol to 
sucrose. In test I the amount of ["C]raffinoso 
formed from [i''C]sucrose has been determined. The 
incubation mixture contained in a total volume of 
50 5y.mo\ Tris-HCl pH 7.2, 1 (xmol galactinol, 
0,02 (imol ['■'C]sueroso (35 }j.Ci/iJ.mol) and enzyme. 
Alter incubation of 1—4 h at 32 °C the reaction was 
stopped with 0,2 ml ethanol and the preparation was 
centrifuged ; the supernatant fluid was separated on 
Whatman No 1 in the solvent system n-butanol — 
pyridinc-watcr-acctic acid (00:40:30:3, v/v/v/v). 
Radioactive spots were located with a strip scanner, 
cut out, and measured directly on paper in a scintilla- 
tion counter in toluene— 2,5-diphenyloxazole (effi- 
ciency 70°/o). This test was also applied for the 
exchange reaction with the only exception that 
0,5 |i,mol raffinose was used instead of galactinol. 
The linear relationship between product formation, 
protein concentration up to 4 mg and incubation 
time up to 0 h has already been demonstrated for 
both reaction [10] and has since also been shown to 
be valid for the more purified enzyme preparations 
used in the kinetic experiments. 

Test II is based on the galactosyl transfer from 
i^C-labelled galactinol to sucrose. With this test one 
can study in addition the amount of galactose set 
free by the hydrolyzing activity of the transferase. 
The incubation mixture contained in a total volume 
of 60 [xl: 5[jimol Tris-HCl pH 7.2, 0.013 [imol 
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[iHll;j:aliu't.iiiol (7 |i(li/(J.inol), 0,5 |iinol sucroMe and 
(')v/,ynu\ The chromntographic Koparation was carried 
out' ill a solvent. HVHteni of a -picolinc —ammonia — 
watcM- (70:2S;2, v/v/v) until the front liad reached 
hall' way down the jiaper. Then n .second run in the 
soh'ent.' .ss'.steni »-butnnol — iiyridinc — acetic acid — 
waler ((K) ; 40 : .'! : lU), v/v/v/v) ' followed. Other con- 
dition.s were the .same ns in tent I, 

ft-GVikr/wiV/fi.s'c. The enzyme was as.snyod by 
following the initial rate of '/J-nitrophcnyl-a-D-galacto- 
pyranoside (Gal-ftONp) hydrolysi.s. Enxyme .solution 
w'a.s incubated at 32 °C with 25 (iniol !)ota.ssium 
phosphate bulTcr pH 5.5 and CO iimol Gal-rtONp 
for 15 niin. The reaction was stopjjcd by adding 
5.0 ml of cold O.I M NajGO^ and the yellow colour 
of yj-nitropbcnol was measured at 406 nm. Controls 
with Gal-(^ONl) as well as with protein alone were 
run concurrently and all values appropriately 
corrected. 

Dclrrminiiiiovs of Molecular Weighl 
T\h'. molecular weight was determined on a 
Scphadex G-200 column (2.5x80 cm) according to 
Andrews [101. T''" column was cluted with 0.01 M 
Tris-irCI pH, 7.5 containing 0.1 M KGI and 2 niM 
dithioorythritol. The calibration was obtained by 
(letcrmiiiatioti of the clution volumes of a number of 
refci'ence pi'otcins of known molecular weight. The 
[sedimentation constant of the enzyme wa.s deter- 
mined bv centrifugation through a linear 5-ml 
gradient 'ranging from 5-20"/o glycerol in 0.05 M 
Tris-HGl pH 7.5 containing 5 niM dithiocrythritol. 
The samjilos were eentrifuged in tlic SWL 50 rotor 
of a Spinco L 2-05 B for 14 h at 0 °G. Then the tubes 
wci'e i)unctured and fractions of drops collected 



with the aid of 
protein aldolase ii 



, fraction collector. As reference 



Polyacrylainidc-Qel Ekclrophorc/fif 
The i)U''ity of the various purification st(^ps was 
routinely checked by polyacrylamidc gel eleetro- 
phoresLs in a 7.5"/o' acrylamide gel according to 
Maurcr[17]. Electrophoresis was performed at 
2.0 mA/tubc until the bromphenolblue band had 
reached the bottom of the tube. Fi.xation and staining 
were carried out according to Ghrambach d til. [J8|. 

Other Procedures 
Protein determinations were carried out accord- 
ing to Lowry cl al. [101 with bovine .scrum albumin 
a.s a .standard. Labelled galactinol was isolated by 
jiaper chromatography from the water-soluble e.x- 
t.i'act of lamium leaves after photosynthe.'-' ' '''"'^ 
according to Kandler[201. A sample of 
galactinol was generously supplied by 
McOready (USDA, Agricultural Research fServie.e, 
Albany). 

JIE.SUIXS 
Purijicalion oj Galactinol: t^ucrow. 
6-Galacl o.vjl I rani^jeraw. 
Table 1 summarizes the results of the overall 
purification. Starting from a crude extract which 
a specific activity of 0.071 nmol X mg"' X h"' a prep-^ 
aration was obtained with a H])oeifie activity of 
2!).8 nmolxmg-' xh-' (peak H of hydro.xyapatite 
chromatography). The results show that the enzyme 
catalyzing the synthesis of Htachyose [8] scj)arates 
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IVoni tlir c'.ori'cspondiniJ: raffinoso-KyntJiosi/jug oir/.ymo 
In (ulditioii il. has t.o lie pointed out tluifc tlia purified 
cn/,\'nu; is dill'ercuit. IVoni an a -i;aln(!t.()HidaHe, Hiiiee 
the' hydroly/.ing activity towai'ds -/)-nitr()i)henyl- 
rt-n-gaiaeto'pyrano.sido (Gal-ftONp), known to bo a 
good sul).stratc for «-galact.osida.sea, Hcpnmtcs likc- 
wiKc from the raflinoso-synthoaizing activity. Since 
till! galaetinol : Kucrose (i-galactoHyi transf'cra.so is 
the most laliih' of the plant galactosyl tranHferaHeH 
known {e.g. itKcenis unlikely that an inactiva- 

tion instead ot a separation of the otlier two 
enzymes had occurred dui'ing the puriRoatitm. The 
considerable decrease of the rt-galactosidaso activity 
in st(!|) 2 may on the other hand be the I'cason Cor 
the obsei'ved increase of the total raffinosc- 
synthesi-z.ing activity in this fraction, since less of 
the newly synthesiz.ed raffinose will be lost by 
hydrolysis, 

The preparation from Vicia jaha also catalyzes 
an eNchange reaction between I'aflinose and sucirose 
according to the following e(|uation: 

llaffinose f | ' 'Clsucrose 

:r:.>: | '■'( ! |ra flinose \- sucrose. 

This I'caetion has oi'iginally l)een described by 
Moreno and (lardini | irij: theii' enzyme pre|)aration 
from wheat germ, howevei', did not catalyze the 
synthesis of ral'Hnose. I'hrough all the stops given 
in Table i (except for step G; see Discussion below) 
the exchange reaction parallels the .Hynthosis activ- 
ity. 'I'huH both reactions most likely are catalyzed 
by one and the same enzme. 

Tn the last piu'ification stoj) two active transfci-asc 
peaks (1 and II in I'^ig.l) wei'c obtained. The main 
fraction, peak II, was eluted with a bulfer concentra- 
tion of 0.2 M, Peak 1, which had much lower 
speeilie activity, appeared at 0.1 M. Both fractions 



were able to catalyze the synthesis as well as the 
exchange I'caction, altliougli at diflercnt relative 
rates. Whereas' peak T catalyzes the exchange reac- 
tion about 10 times faster' than the .synthesis of 
raffinoso, peak ,11 catalyzes the exchange reaction 
only at 85"/o the rate of synthesis reaction. Further 
cxpci'iincnt.'i indicated that peak I is a modified 
form of the enzyme, which has lost most of its 
raffino.so-synthosizing activity and shows a different 
elution bcliaviour as compared to the native enzyme. 
Thus, when peak IT was chromatograpliod a second 
timo on hydroxyapatitc, again an active jieak .1 and 
It was obtained. The observation made previously, 
that the activity for raffinose .synthesis is lost more 
readily than tlie activity of the exchange reac- 
tion [i0|, is in agreement with tlie above finding, 

When checked for purity by polyacrylamido gel 
electrophoresis the 400-fold purified fraction was not 
yet homogeneous; one major and three minor bands 
'have been observed (Fig. 2). Although a strong 
attempt lias been made to correlate the- enzyme 
activity with one of the bands, this has failed; the 
enzymie activity always got lost during gel- 
electrophorescs, oven in the ])resenee of a variety of 
])rotccting agents. 

The enzyme remained in the supernatant when 
the enzyme solution was eentrifuged at 100000 Xf7 
for I h.' 

Delarininalion oj Molecular WcAghl 
The mnlecular weight of the enzyme was deter- 
mined by two difforont mcthod.s. .From the .sedimen- 
tation profile in a glycerol den.sity gradient a 
molecular weight of 100000 was obtained when 
eoin])ared to the sedimentation of aldolase (Fig. 3). 
With iSephadex G-200 gel chromatography on a 
standardized column (Fig.4) a value of 80000 was 
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i^t'iliiiic-iilalion. jirojilr nj gaUicHnol ; .vicwsc. (i-gnlnclo- 
fn/llran/tjerase in a 5 — 20"]^ gli/ccrol de.nKily ijrailie.nl. 
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(Ic'tcnnincd. In each cane, lidwovcr, the Haine vahicH 
were ()!)Hervo<l, whetiier HyiithewiH or exchange 
nctivity liad beeti tested, 

RlubiLUy 

Wlieii .stored at 4 °C tlic ci'ude extract lost ijO"/,, 
of its original activity in tlie .synthesis reaction and 
iiO'i/n in exchange reaction within I! clays, Tiic 
activity of the purilicHi enzyme when frozen was 
unchanged for at lea.st a month. 

7)// Optimum 
The enzyme showed an optimnrn around pH 7.0, 
In the pi'c.sencc of jiota.s.siurn phos|)hato buffer the 
activity was higher than in the ])resence of Tris-IICl 
bnirer'(Fig.,5). " 
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out under alantiard eontlitioriK for 150 niin 



I'Jjjcd oj Tamycnilurv. on. Ihc Knzi/me Aclivlly 
Fig.G kHowh tlu! temperature |)roHlc of the enzyme 
activities. iMaximum I'atc for both reactions occurK 
at 42 °C; witli a ahar]) di'op beyond 44 °C, the syn- 
thesia reac.l^ioiia l)eiiig aoinewliat moi'c aenaitive than 
tlie exchanffe I'caction, In tliis (umnection it baa 
been ()l)aei've(l tliat galactional and raf'finoae prevent 
to aome extent inactivation by heat ('.rable 2). 
Rueroae, liowever, at the eoneenti'ationa iiaed had no 
efreet. 

Ivhibiiion wUh Snllhydrj/l-Spixijic. Reaijc.nU 
One of tiie main ix-asona that the first step in tlie 
bioaynf.lieaia of raffinoae aiiyara esciaped detection for 
a ratlier long time has cc^rtainly been the ro(|uirement 
of the eir/.yme for atrong Slf-'protecting agents [101. 
Tliia is especially time for the synthesis reaction. 
'J'lic (lifTerent susceptibility ol'syiitbcaia and exchange 
reaction ia also reflected by the inhibition of the 
enzyme with iodoacctamide and jV-ethylmalciniide 
(Table 3). The heavy metal ions Ag+, 'llg2+, Zn'-+ 



and AF+ at a conccnti'ation of 1 mM inhibited the 
synthesis reaction of the enzyme to lOO^/^; Mn^^ 
inhibited to G0o/„. 

Knzijnir Kindic-i 
Km N'alues foi' galnetinol, aneroae and i-affinose 
have been determined (Fig. 7, Fig. 8 and Table 4). 
The Michaelis constant for sucrose was found to be 
t niM in the .synthcais reaction and 2.!) mM in the 
exchange reaction in the presence of 0.02 M galac- 
tinol and raffinose, resjieetivcly. When the' galactinol 
and raffinose concentrations were decreased 100-fold, 
the Km f»i' sucrose in the synthesis reaction stayed 
the same (1.4mj\l.). It was, however, considerably 
lower (0.47 mM) in the exchange experiment. This 
is consistent with the a.ssumption that the binding 
site for raffinose and sucrose might be iflcntical; a 
high raffinose conccnti'ation would then act as 
competitive inhibitor. On the other hand the sites 
for galactinol and siici'ose seem to be different; a 
change in the Cioncentration of galactinol has no 
influence on the A'n, of sucrose. It has to be pointed 
out that the /<'„,-valuca for galac:tinol and raffinoae 
given in Table 4 are only valid for a sucrose concen- 
tration of 0.4 mM. 

Acceptor and Donor Spc.cijiciU/ 
The acceptor specificity has been tcated by mca- 
aui'ing the tranafei- of the '■'G-labcllcd galactosyl 
moiety from | '''Olgalnctinol to various acocjitors. 
Out of 10 aeecf)tors tested only a transfer to sucrose 
could be observed (Table 5). The purified enzyme 
cannot catalyze the bioayntheais of stachyose and 
vei'bascoae. Both those enzymic activities have 
already been found in seeds from Vicia jaba [8], It 
should be noted that during the incubation of 
['■'Cjgalactinol some free ['■'Clgalactose was obtained 
due to the hydrolyais of galactinol. However; in the 
])resence of sucrose the amount of galactose trans- 
ferred was nearly 5 times greater than the amount 
of galactinol hydrolyzcd (Table T)). In the absence of 
any acceptor considerable more galactose Avas set 
free. This can be interpreted as a compctitition of 
'ith water. As donors only galactinol, 



L, TjKhi.h (uul W. Tannkh 



100 



1/lsl (mM-') 
n fcho prcfiCMOcof 0.4inM fii 









8 

- ■ 


































y 












O.i 0.8 :.2 !,6 




^'|5l"(niM->)'' 






Fig.s. IAn,:HTarc.y-Hurl 


ii/f)?.'.'; r.rchatu/r. rc.nclion. (A) Hi\r(iiU)si; i 
of O.OL iM 


rnffinose 


(•0 of 0.4 niiM .muu'c 


so; (!!) 




Tiil)lo 4. ]\:,„-m/».n.s' 0/ 


(jalac.linol : uncro.ve. (i-galriclo.ti/llriin.i- 


Table 5. . 




,y/"■m«."/""r^^ 

, 0.0;i9|imc 
in. After 4 


pecijiciUi 
e. (Sr.pluule 
the inciib 


t Irnclion) 








In the nt 
taiiicd in 

niid 0.3 n 


ooptor 
n total 
acccpto 
g proto 










) ,.1 ; r> (xm 
1 [''Cteal 
h nt ;!2 " 


ol TriR-HCl pll 7.2, 
ctinol (7 |xCi/pimol) 
1 tlio roaotion wnn 


Siic.roHO" 




7.0 

10 

1.0 2.!) 

0.47 


"mUnino,! 

i';mor)',''fiV 


n tlio ( 
0.5 [im 
mol Tr 
1 h n 
galaoto 

sly 


onor o.xpor 
il donor, 0 
s-HCl and ( 
I :i2°C. K 
0, lac.tow. 


mcnit. tlio 
02|.mol [ 
1 rng prol 
iffinoBP, s 
collobioH 


■iGl.iiiciro.sn (liri (xCi/ 
oin. The irimibation 
taoliyoHC, fniotosu, 
c. mclibio.so and" 

plors 



































Gh1-aON]i (an iin])hyHi()l()gic'nl .suh.strnte) and rni'li- 
no.se, i.e.. in the exchange reaet.ion, work to a .signifi- 
cnnt extent (I'ablc 5). Transf'ci' fVoin UDP-galaeto.se 
to sucrose hafi been ob.servcd neither witii the j)urificcl 
cn7,yme nor the eriule extract [10], 

DISOU.S.SIOX 

The enzyme catalyzing tlie tran.sfer of the galac- 
tosyl moiety from galactinol to sucrose has been 
isolated, purified and characterized. The results 
indicate that the enzyme is clearly dilTcrcnt from 
any of the a-galactosidases described [5,21 —25|. 
Thus the hydrolyzing activity towards Gal-aONp, 
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a typical sub.stratc for a-galactosidaaes, Hcparatos 
from the raffinoso-synthesizing enzyme during the 
purification. Furthermore the high substrate speci- 
ficity as well as the efficiency of tlie transfer have to 
be pointed out, when the enzyme is compared with 
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A-i^ali\cf(isi(lnH(',s. It^ is ])i'()])oxo(l to call ihv. enzyme 
fialact inol ; KiU'i'OHi' (i-ualnel()Hyl-tranH('(;raHe and to 
jfroup il. ainon}^ the f^lycosyl ti'anHfcrasoH. 

'I'lic cx(^hange. i-caction is catalyzed by the wamc 
enzyme whicli reHpon.sihle for raffinoHe BynthcRis. 
This lias also l)een e.\pocte<l in analogy to similar 
transfer rea(!tions [7,2(),27|. 

The err/,ymo activity tif?.!) nmol raffinosc formed 
xh'XK secds^' (Table 1) eorresponcls to an 
activity of 28.4 nmolxlr ' Xg seeds" ' at the ])hysio- 
logi(ial sucrose concenti'ation of 10 mM. This 
rate is high eno\igli to explain th(^ synthesis rate 
?'?(. vivo for I'afHnose and for all the other higher 
honiologucs of the raffinosc sugars during the ripen- 
ing period. '.Fhus the en7,ymo is able to synthcsi^.e 
2.5 (j.in()l rnffinosc, the amount actually i)re,scnt in 
1 g of seeds, in less than 4 days. 

The synthesis of the total amount of tlu^ other 
rafnnose-ty|)e sugars (21.4|xmol/g seed) woidd take 
aljoutone month, which corresponds reasonably well 
to the ripening period of Ihc seeds. 

[n addition the results of the biosynthesis of 
raffinosc and its higher homologues m vilm with 
respect to the function of galactinol arc in agreement 
with the st\idies in vivo by Senscr and Kandlcr 
|2,2()|. It ,secms without doubt now, that tiic bio- 
synthesis of all the raffinosc sugars jjroceeds via 
galactinol. The jihysiological meaning of the detour 
taken by the galactosyl inoicty i.s not understood 
at present. Perhaps it has to bo seen in relation to 
the observation that 7;i7yo-inositol and galactinol 
inhibit tt-galactosidascs, enzymes responsible for 
the decomposition of I'affinose sugars 

/l/v/o-inositol has been known as a growth factor 
for yeasts and many tissue c.idturos [31 — .'i.'i | for a 
long time. Since these cells do not contain sugars 
of the raffinosc family the eofaotor-like role, which 
■//i.?/o-inositol i)lays in the biosynthesis of oligo- 
saccharides, cannot explain its function a,s a growth 
factor. ]t seems likely, however, that ?;n/(;-inositol 
i,s absolutely rocjuircd in tlic form of phosphatidyl- 
inositols, wliich seem to be indispensable membrane 
components |I)41. This is sup])ortcd by the finding 
that transport mechanisms ai-e impaired when cells 
lack ?/(?/()-inositol [ISf) — I!?!. 

\V(! would like to thniik DrK .A. Bock and II. Kosnkowski 
for iKilpl'ul .siigge.stiniiH niul ii(h'ieo. 
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Abstract Raffinose (0-a-D-galactopyranosyl-(l->6)-C- 
a-D-glucopyranosyl-(l-M-2)-0-/?-D-fructofuranoside) is a 
widespread oligosaccharide in plant seeds and other 
tissues. Raffinose synthase (EC 2.4.1.82) is the key en- 
zyme that channels sucrose into the raffinose oligosac- 
charide pathway. We here report on the isolation of a 
cDNA encoding for raffinose synthase from maturing 
pea {Pisum sativum L.) seeds. The coding region of the 
cDNA was expressed in Spodoptera frugiperda Sf21 in- 
sect cells. The recombinant enzyme, a protein of glyco- 
side hydrolase family 36, displayed similar kinetic 
properties to raffinose synthase partially purified from 
maturing seeds by anion-exchange and size-exclusion 
chromatography. Apart from the natural galactosyl 
donor galactinol (0-a-D-galactopyranosyl-(l->l)-L-mj^o- 
inositol), /^-nitrophenyl a-D-galactopyranoside, an arti- 
ficial substrate, was utilized as a galactosyl donor. An 
equilibrium constant of 4.1 was determined for the 
galactosyl transfer reaction from galactinol to sucrose. 
Steady-state kinetic analysis suggested that raffinose 
synthase is a transglycosidase operating by a ping-pong 
reaction mechanism and may also act as a glycoside 
hydrolase. The enzyme was strongly inhibited by 1-de- 
oxygalactonojirimycin, a potent inhibitor for a-galacto- 
sidases (EC 3.2.1.22). The physiological implications of 
these observations are discussed. 

Keywords cDNA cloning ■ Enzyme kinetics • 
Galactinol • Pisum ■ Raffinose synthase • Seed 
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Introduction 

Raffinose (0-a-D-galactopyranosyl-(l ->6)-0-a-D-gluco- 
pyranosyl-(l<->2)-C>-j5-D-fructofuranoside) and its higher 
homologue stachyose are major soluble carbohydrates 
in seeds, roots and tubers of many plant species (Avigad 
and Dey 1997). In the Lamiaceae, Cucurbitaceae, 
Oleaceae and other plant families, they are the pre- 
dominant carbohydrates translocated in the phloem. 
Apart from carbon transport and storage, these oligo- 
saccharides may function as protective agents during 
maturation drying of seeds (Horbowicz and Obendorf 
1994) and during cold stress (Gilmour et al. 2000). The 
biosynthesis of raffinose was initially proposed to pro- 
ceed by galactosyl transfer from UDP-galactose to su- 
crose (Bourne et al. 1965; Pridham and Hassid 1965; 
Imhoff' 1973), but the direct nucleotide pathway has 
been disputed (Lehle and Tanner 1973; Bachmann et al. 
1994). It is now generally accepted that raffinose is 
synthesized by the following reaction sequence: 

UDP - galactose-)-7njo-inositol UDP -|- galactinol 

(1) 

Galactinol -|- sucrose mjo-inositol + raffinose (2) 

Reactions 1 and 2 are catalyzed by galactinol synthase 
(EC 2.4.1.123) and raffinose synthase (EC 2.4.1.82), re- 
spectively. Raffinose in turn is the substrate for stachy- 
ose synthase (EC 2.4.1.67), which adds a further 
galactose unit from galactinol: 

Galactinol + raffinose ?=i m_);o-inositol -F stachyose (3) 

Galactinol synthases and stachyose synthases have been 
characterized and corresponding cDNA sequences have 
been cloned from several sources (for review, see 
Peterbauer and Richter 2001). In contrast, raffinose 
synthase has attracted little attention, probably because it 



840 

is the most labile enzyme of the pathway. Since the pio- 
neering work of Lehle and Tanner (1973), who purified the 
enzyme 400-fold from seeds of Vicia faba, raffinose syn- 
thase has only been characterized in a crude preparation 
from leaves of AJuga reptans (Bachmann et al. 1994), al- 
though genes encoding for ratiinose synthases have been 
reported in patents (Oosumi et al. 1998; Watanabe and 
Oeda 1998). At the amino acid level, these sequences show 
homology to stachyose synthases (Peterbauer et al. 1999, 
2002) and seed imbibition proteins of unknown enzymatic 
function (Anderson and Kohorn 2001; Romo et al. 2001). 

We have recently described the changes in the activity 
of ratiinose synthase and other enzymes of the pathway 
during seed development of pea cultivars (Peterbauer 
et al. 2001). Here we describe the isolation and heter- 
ologous expression of a cDNA encoding a ratfinose 
synthase from developing pea seeds. The kinetic prop- 
erties of the recombinant enzyme are compared with 
those of a partially purified raffinose synthase prepara- 
tion. We demonstrate that pea raffinose synthase is a 
transglycosidase with structural and biochemical 
similarities to a-galactosidases. 



Materials and methods 

Plant material and chemicals 

Seeds of pea {Pisum sativum L. cv. Wunder von Kelvedon) were 
obtained from a local supplier (Austrosaat, Vienna, Austria). 
Plants were grown in commercial potting soil in a growth chamber 
at 22/18 °C day/night temperature and 50/80% relative humidity 
with a 16-h photoperiod. Seeds were harvested 20-30 days after 
flowering and were stored in liquid nitrogen. 

Galactinol (0-a-D-galactopyranosyl-(l->l)-L-mj'o-inositol), d- 
ononitol (lD-4-0-methyl-n7>'o-inositol), galactosyl ononitol 
(0-a-D-galactopyranosyl-(l->3)-4-C>-methyl-D-;M;'o-inositol) and 
galactopinitol A (0-a-D-galactopyranosyl-(l->2)-4-0-methyl-D- 
chiro-mosiioX) were available from previous studies (Wanek and 
Richter 1 995; Richter et al. 1 997; Peterbauer et al. 1 998). Sucrose and 
D-galactose were obtained from Fluka (Vienna, Austria). Stachyose 
was from Merck (Vienna, Austria) and D-pinitol (lD-3-O-methyl- 
c/)!/-o-mositol) was from Aldrich (Vienna, Austria). Raffinose, 
ni^/o-mositol, />-nitrophenyl a-D-galactopyranoside and 1-deoxyga- 
lactonojinmycm (l,5-dideoxy-l,5-imino-D-galactitol) were from 
Sigma (Vienna, Austria). 



Reverse transcription-polymerase chain reaction (PCR) 
and rapid amplification of cDNA ends 

Total RNA was extracted with an RNeasy Plant Mini kit and 
poly(A)+ RNA was isolated with an Oligotex mRNA kit (Qiagen, 
Hilden, Germany). First-strand cDNA was synthesized from 
poly(A)"'" RNA using AMV reverse transcriptase and an oligo-(dT) 
primer. Amplification by PCR was performed with HotStar 
Tag DNA polymerase (Qiagen) and degenerate primers. The 
sense xprimer 5'-TT(T/C)GGITGGTG(C/T)ACITGGGA(T/ 
C)GC-3' (where I denotes inosine) was based on the amino acid 
sequence motif FGWCTWDA. The antisense primer 5'-CCAIC- 
CI(G/C)CI CC(C/T)TG(A/G)CA(G/A)TT(G/A)AA-3' was based 
on the motif FNCQG(A/G)GW, PCR products were isolated from 
agarose gels, cloned into the pCR2.1-T0P0 vector (Invitrogen, 
Lofer, Austria) and sequenced using an ABI Prism BigDye Ter- 
minator Cycle Sequencing Mix and an ABI Prism 310 sequencer 
(Applied Biosystems, Vienna, Austria). RNA ligase-mediated rapid 



amplification of the missing cDNA ends was performed with the 
GeneRacer system (Invitrogen) as suggested by the manufacturer. 
The 5'-end was amplified with the gene-speciflc primer 5'-CGGTT 
CATTCCATCTCGCTCTGTAA-3'. The 3'-end was amplified 
with the gene-specific primer 5'-TTGTTTTGCCCGA CGGTTCT 
ATCTT-3' followed by nested PCR with the gene-speciflc primer 
5'-GTCAACATTACGCACTCCCTACACGA-3'. PCR products 
were cloned and sequenced as described above. The assembled 
cDNA sequence was deposited in the EBI database under the 
accession number AJ426475. 



Expression of raffinose synthase in baculovirus-infected 

The open reading frame encoded by the putative raffinose synthase 
cDNA was amplifled from reverse-transcribed poly(A) RNA using 
Pfu DNA polymerase (Promega, Mannheim, Germany) and the 
primers 5'-CTGCAGGCACCACCAAGCATAAC-3' (sense) and 
5'-GGTACCCATG AGGG ATCA A AATA AAA AC-3' (antisense) . 
The PCR product was cloned into pCR2.1-TOPO and sequenced. 
Restriction sites for Pst\ and a Kpnl (underlined) were included in the 
PCR primers for subsequent subcloning of the fragment, in frame, 
into the baculovirus expression vector pVTBacHis-1 (Sarkar et al. 
1998). Co-transfection of the expression vector with linearized viral 
DNA and amplification of recombinant baculovirus in Spodoptera 
frugiperda insect cells was performed as previously described (Mucha 
et al. 2001). Infected Sf21 insect cells expressing the recombinant 
protein under control of the polyhedrin promoter were lysed in 
50 mM Na-phosphate (pH 7.0), 1 mM DTT, containing a set of 
protease inhibitors (Complete Protease Inhibitor Cocktail; Roche, 
Vienna, Austria). Cell lysates were desalted by repeated ultrafiltra- 
tion in 50 mM Na-phosphate (pH 7.0), 1 mM DTT, using Centricon 
Plus-20 ultrafiltration units (Millipore, Vienna, Austria), and 
assayed for activity. Western blot analysis of crude cell culture 
supernatants with monoclonal antibodies against an enterokinase 
site provided by the expression vector was performed as previously 
described (Mucha et al. 2001). 



Partial purification of raffinose synthase from pea seeds 

All steps were carried out at 4 °C unless otherwise stated. Maturing 
seeds (95 g) were immersed in liquid nitrogen, ground to a fine 
powder and suspended in 1 50 ml of 50 mM Hepes-NaOH 
(pH7,0), 20 mM MgClj, 2.5 mM EGTA, 0,5 mM DTT, 1% 
polyvinylpolypyrrolidone. The extract was further homogenized 
with a Polytron tissue homogenizer, filtered through one layer of 
fine-mesh nylon and centrifuged at 26,000 g for 30 min. The su- 
pernatant was adjusted with stirring to 2 mg mP' protamine sulfate 
by dropwise addition of a 10% (w/v) protamine sulfate solution (to 
prevent clogging of columns in subsequent chromatography) in 
50 mM Hepes-NaOH (pH 7.0). Precipitated material (nucleic ac- 
ids, some storage proteins and other contaminants) was removed 
by centrifugation at 26,000 g for 20 min. The cleared supernatant 
was subjected to fractionation with solid ammonium sulfate. Pro- 
teins precipitating between 35 and 55% saturation were collected 
by centrifugation at 26,000 g for 20 min, dissolved in 20 mM bis- 
Tris propane-HCl (pH 6.8), 0.5 mM DTT, and dialyzed overnight 
against this buffer. The sample was loaded onto an anion-exchange 
column (90 ml, 2.5 cm i.d.) of Macro-Prep High Q (Bio-Rad) 
maintained at 12 °C. Bound protein was eluted at a flow rate of 
5 ml min"' with a linear gradient (750 ml) of 0-250 mM NaCl in 
20 mM bis-Tris propane-HCl (pH 6.8), 0.5 mM DTT. Active 
fractions were pooled, concentrated by ultrafiltration, and appUed 
at a flow rate of 0.5 ml min"' to a Superdex 200 HR 10/30 size- 
exclusion chromatography column (Amersham Pharmacia Biotech, 
Vienna, Austria) equilibrated with 20 mM Na-phosphate (pH 7.0), 
1 mM DTT, 150 mM NaCl. Active fractions were pooled, 
concentrated by ultraflltration, ana storea in aliquots in liquia 
nitrogen. 



Enzyme and protein assay 



RafRnose synthase activity was routinely determined at 30 °C in 
reaction mixtures (20 iil) containing 50 raM Na-phosphate 
(pH 7.0), 1 mM DTT, 10 mM galactinol and 20 mM sucrose. 
Enzyme samples were diluted and incubation times were adjusted 
to allow transformation of not more than 10% of the substrates 
into products. Reactions were stopped by boiling for 5 min. Re- 
action mixtures were diluted to 0.5 ml and centrifuged for 5 min 
at 12,000 g. Formation of raffinose and galactose (arising by 
hydrolysis of galactinol) was determined by HPLC with pulsed 
amperometric detection using a Carbopac PA- 10 column 
(250 mm long, 2 mm i.d.; Dionex, Vienna, Austria) as previously 
described (Peterbauer et al. 2002). Formation of m;^o-inositol and 
galactinol (a product in the reverse reaction) was determined by 
HPLC using a Carbopac MA-1 column (Dionex) thermostatted at 
25 "C. Sugars were eluted with 150 mM NaOH at a flow rate of 
0.4 ml min"'. After each run, the column was washed with 1 M 
NaOH to elute raffinose. Synthesis of galactosyl ononitol and 
galactopinitol A by galactosyl transfer from galactinol to D-on- 
onitol and D-pinitol, respectively, was determined by capillary gas 
chromatography as previously described (Peterbauer et al. 1998; 
Hoch et al. 1999). The concentration of soluble protein was de- 
termined by the dye-binding procedure (Bradford 1976) using 
BSA as a standard. 



e kinetic analysis and equilibrium 



Steady-s 

For the determination of kinetic constants of raffinose synthesis, 
samples were incubated with varying concentrations of the first 
substrate at several fixed concentrations of the second substrate. 
Data were fitted to the initial rate equation for a ping-pong Bi Bi 
mechanism: 

K.ax[A][B] 



^:,„„[a1+^:„,a[b] + [a][b] 



where v is the initial velocity, F^ax is the maximum velocity, [A] 
and [B] are the concentrations of the substrates, and AT^a A^„,b 
are Michaelis constants for A and B, respectively. Inhibition pat- 
terns were determined graphically by replots of slopes and inter- 
cepts of primary double-reciprocal plots. Inhibition constants were 
estimated by fitting the untransformed data to Eq. 5 or 6, corre- 
sponding to linear competitive or mixed (non-competitive) inhibi- 
tion, respectively, or to Eq. 7, corresponding to an hyperbolic 
uncompetitive inhibition pattern (Cleland 1963): 



K,„il+[l]/K,.) + [S] 



cDNA cloning and analysis of pea raffinose synthase 

To isolate a cDNA encoding for raffinose synthase by 
reverse transcription-PCR, degenerate oligonucleotide 
primers were designed based on amino acid motifs 
conserved among Cucumis sativus raffinose synthase, 
stachyose synthases and related sequences (Peterbauer 
et al. 1999). To distinguish between raffinose synthase 
and stachyose synthase, the primers were chosen to 
encompass a block of about 80 amino acids, which is 
exclusively present in stachyose synthases. Two PGR 
products of about 1.2 and 1.4 kbp, respectively, were 
obtained from mRNA isolated from maturing seeds. 
Upon sequence analysis, the longer fragment revealed 
100% identity with pea stachyose synthase (Peterbauer 
et al. 2002). The 5'- and the 3'-end of the 1.2-kbp frag- 
ment were extended by RNA ligase-mediated rapid 
amplification. The composed sequence of 2,652 nucleo- 
tides contains an open reading frame of 2,394 nucleo- 
tides encoding for a polypeptide of 798 amino acids with 
a calculated molecular mass of 88.7 kDa. All other 
methionine codons were found to be in-frame with the 
putative start codon. 

A pattern search using the BLOCKS database 
(HenikofF et al. 2000) revealed the presence of the gly- 
coside hydrolase superfamily GH-D signature. Accord- 
ing to the sequence-based classification of Henrissat 
(Henrissat and Davies 2000), this superfamily is formed 
by a-galactosidases from glycoside hydrolase families 27 
and 36 (Dagnall et al. 1995). An alignment of one of the 
characteristic motifs with members of the GH-D su- 
perfamily is shown in Fig. 1. It contains a conserved 
aspartic acid residue, which acts as a catalytic nucleo- 
phile to generate a covalent glycosyl-enzyme intermedi- 
ate in a-galactosidases of family 27 (Hart et al. 2000; Ly 
et al. 2000). High overall sequence homology with seed 
imbibition proteins (SIPs) and stachyose synthases (not 
shown) places the pea protein into the related glycoside 
hydrolase family 36 (for database entries, see http:// 
afmb.cnrs-mrs.fr/'^cazy/CAZY/index.html). 



^,„(i + [il/^ic) + [s](i + [i)/i:iu) 



^„, + [S!(i + [i]/i^u,)/(i + [i]/^:;j 

where [S] is the concentration of the variable substrate, [I] is the 
concentration of the inhibitor, K;^ is a competitive inhibition con- 
stant, and T^iu and are uncompetitive inhibition constants, re- 
spectively. 

To estimate K^^ of raffinose synthesis, the partially purified 
enzyme (230 pkat) was incubated in a final volume of 75 |j1 with 
40 mM galactinol and sucrose, 40 mM raffinose and «j;'o-inositol, 
or with all four substrates (20 mM each), respectively. At intervals, 
aliquots were removed and analyzed by HPLC. Kinetic and ther- 
modynamic constants are given as means ± SE. 



PsRFS (AJ42C47S 

CSRFS (AP073744 

PsSTS (AJ3 11087 

RbGAI. (AF4066'!0 

CaGAL (L27992) 

PcGAL (AF246263 
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SCNNNNI -153 
iCYNEGl -IB*? 



Fig. 1. Partial alignment of pea {Pisum sativum) raffinose synthase 
with selected members of the glycoside hydrolase superfamily 
GH-D. PsRFS, P. sativum raffinose synthase; CsRFS, Cucumis 
sativus raffinose synthase; PsSTS, P.' sativum stachyose synthase; 
BbGal, Bifidobacterium breve a-galactosidase; CaGAL, Coffea 
arabica a-galactosidase (preprotein); PcGAL, Phanerochaete chry- 
sosporium a-galactosidase (preprotein). EBI/GenBank accession 
numbers are shown in parentheses. The catalytic aspartic acid 
residue in C. arabica and P. chrysosporium a-galactosidase is 
marked by an asterisk. The alignment was generated using 
CLUSTAL W (Thompson et al. 1994) 
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Expression of raffinose synthas 



1 insect cells 



The coding region of the cDNA was engineered into a 
baculovirus expression vector and inserted into the 
baculovirus genome by homologous recombination. 
Desalted lysates of insect cells infected with recombinant 
virus displayed rafRnose synthase activity (0.82 pkat mg"' 
soluble protein), while no synthesis of raffinose was 
detected in lysates of uninfected control cells. The 
recombinant protein, which was fused to a leader se- 
quence containing a honeybee melittin signal peptide for 
secretion, a hexahistidine tag and an enterokinase 
cleavage site, was also detected in the culture medium of 
infected insect cells by Western blot analysis with 
monoclonal antibodies against the enterokinase site 
(Fig. 2). However, several attempts to purify the protein 
from the medium by chromatography on iminodiacetic 
acid-Sepharose charged with Ni failed. Only traces of 
activity were recovered, probably because Ni^"^ desta- 
bilizes the enzyme. When insect cell lysates were incu- 
bated for 1 h with 1 mM NiCli in solution, activity 
decreased to 30.2% of untreated controls. The steady- 
state kinetics of raffinose synthesis were therefore ana- 
lyzed using lysates of insect cells. A double-reciprocal 
plot revealed a set of apparently parallel lines (Fig. 3). 
This pattern is characteristic of a ping-pong Bi Bi 
mechanism, in which a glycosyl-enzyme intermediate of 
any kind is formed and the first product dissociates 
before the second substrate is bound. Kinetic constants 
were estimated as described in Materials and methods 
(Table 1). Due to endogenous a-galactosidase activity in 
insect cells, no attempts were made to characterize 
hydrolytic activity of the recombinant protein (see below 
for further analysis). 

Formation of raffinose was also observed when 
galactinol was replaced by galactosyl ononitol, a methy- 
lated derivative of galactinol, or by the artificial substrate 
;;-nitrophenyl a-D-galactopyranoside (Table 2). On the 



kDa 



Fig. 2. Heterologous expression of pea raffinose synthase in 
Spodoptera frugiperda insect cells infected with recombinant 
baculovirus. Crude cell culture supernatants were subjected to 
Western blot analysis with monoclonal mouse antibody raised 
against the enterokinase recognition sequence fused to the 
recombinant protein. Bound antibody was visualized with anti- 
mouse antibodies conjugated to horseradish peroxidase and 
chemilummescence detection. Lane 1 Supernatant from infected 
insect cells, lane 2 supernatant from uninfected control cells 



other hand, sucrose could be replaced by o-ononitol and 
D-pinitol, yielding galactosyl ononitol and a galactopini- 
tol, respectively. Raffinose itself was not utilized as an 
acceptor. 



Partial purification and characterization of raffinose 
synthase from maturing seeds 

After sample clean-up by treatment with protamine 
sulfate and ammonium sulfate fractionation, raffinose 
synthase was partially purified from a pea seed extract 
by anion- and size-exclusion chromatography. The final 
preparation had a specific activity of 75.4 pkat mg"' 
protein. The K^^ values for galactinol and sucrose were 
experimentally indistinguishable from those determined 
for the recombinant raffinose synthase (Table 1). Rela- 
tive activities towards other donor and acceptor sub- 
strates were also similar (Table 2). wyo-Inositol acted as 
a linear mixed product inhibitor with respect to galact- 
inol, while linear competitive inhibition was observed 
with respect to sucrose as the varied substrate (Table 3). 
Partially purified raffinose synthase was strongly inhib- 




0.0 0.1 0.2 0.3 0.4 0.5 
1 /[Galactinol] (mM"'') 

Fig. 3, Initial velocity pattern of raffinose synthesis catalyzed by 
recombinant pea raffinose synthase in lysates of insect cells. The 
concentrations of sucrose were 10 mM {filled circles), 13 mM {open 
circles), 20 mM (filled squares), 40 mM {open squares) or 80 mM 
{triangles). Lines represent the best fit to Eq. 4 



Table 1. Kinetic parameters for the synthesis of raffinose and hy- 
drolysis of galactinol catalyzed by insect cell lysates expressing the 
recombinant enzyme and by raffinose synthase partially purified 
from pea {Pisum sativum) seeds, respectively. Kinetic constants 
were estimated by non-linear regression as described in Materials 
and methods 

Reaction 



(pkat mg ') 

Insect cell lysates 

Raflanose 2.0 ±0.1 

Partially purified enzyme 
Raffinose 199.2 ±7.8 

Galactinol 27.3 ±0.3 

hydrolysis 



7.3±0.5 
I.0±0.1 
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Table 2. Substrate specificity of recombinant rafflnose syntliase 
determined in lysates of insect cells and of a partially purified 
rafBnose synthase preparation from maturing pea seeds. Reaction 
mixtures contained galactosyl donors and acceptors at concentra- 
tions of 10 and 20 niM, respectively, n.d. Not detected 



Donor Acceptor Relative activity (%) 



Insect cell Partially purified 









raiflnose synth 


Galactinol 


Sucrose 


100.0 


100.0 


Galactinol 


Raffinose 


n.d. 


n.d. 


Galactinol 


D-Ononitol 


86.6 


105.1 


Galactinol 


D-Pinitol 


42.3 


32.6 


Galactosyl 


Sucrose 


67.1 


68.3 


ononitol 








/j-Nitrophenyl 


Sucrose 


33.5 


39.4 


a-D-galactopy 








ranoside 









ited by 1-deoxygalactonojirimycin (Fig. 4). Inhibition 
was found to be competitive with respect to galactinol, 
with an apparent Kic of 189±14nM. Stachyose 
(50 mM) had no effect on raffinose synthase activity. 

The enzyme exhibited an optimum at pH 7.0 (Fig. 5). 
At the optimum of pea a-galactosidase (pH 4.5; Peter- 
bauer et al. 2001), very little hydrolysis of galactinol was 
detected, suggesting that the preparation was essentially 
free of acidic a-galactosidases. However, some hydro- 
lysis occurred around pH 7.0. When the rate of release 
of galactose at pH 7.0 was considered as the reaction 
rate, sucrose acted as linear mixed inhibitor (Fig. 6a). 
Inhibition constants are compiled in Table 3. When the 
rate of release of mjo-inositol at several fixed levels of 
sucrose was plotted, parallel lines were obtained 
(Fig. 6b). A replot of intercepts of the primary double- 
reciprocal plot as a function of the sucrose concentration 
revealed hyperbolic activation. These patterns of inhi- 
bition of the hydrolytic activity together with those of 
the transfer reaction are unique for a ping-pong mech- 
anism where an unstable glycosyl-enzyme complex is 
formed' (Cleland 1963; 1970). In other words, the in- 
termediary glycosyl-enzyme complex either reacts with 
sucrose to give raffinose, or hydrolyzes to give galactose 
and free enzyme. The observation that sucrose acted as 
an activator (and not as an inhibitor) with respect to the 
rate of myo-inositol formation indicates that the release 
of raffinose is faster than hydrolysis of the glycosyl- 
enzyme complex. 



Equilibrium of raffinose synthesis 

To estimate K^^ for raffinose synthesis, the enzyme was 
incubated with varying concentrations of substrates and 



'This special ping-pong mechanism predicts convergent rather than 
parallel lines in double-reciprocal plots of the rates of raffinose 
formation. Within substrate ranges used here, however, hydrolysis 
of galactinol was too slow, as compared with the transfer reaction, 
to allow detection of convergence experimentally (see Fig. 3). 



products (Fig. 7). Starting from either side of the reac- 
tion, mass action ratios approached similar values. 
From the substrate and product concentrations obtained 
after 5 h, a mean mass action ratio of 4.1 ±0.6 was 
calculated for the synthesis reaction. Since hydrolysis of 
substrates was slow compared with the transfer reac- 
tions, it is reasonable to assume that K^.^ is close to 4. 



Discussion 

A cDNA encoding for raffinose synthase was isolated 
from maturing pea seeds and functionally expressed in 
insect cells. The kinetic properties of the recombinant 
protein were similar to those of partially purified raffi- 
nose synthase (Table 1), providing good evidence that 
the cloned cDNA corresponds to the enzyme expressed 
in developing seeds. The synthesis reaction was revers- 
ible with a A'eq of about 4 (Fig. 7). This value is very 
similar to the equilibrium of stachyose synthesis (Tanner 
and Kandler 1968). Like the corresponding enzyme from 
Vicia faba (Lehle and Tanner 1973), pea raffinose 
synthase displayed an optimum at pH 7.0 (Fig. 5). 
Steady-state kinetics (Fig. 3) and product inhibition by 
myo-inositol (Table 3) suggested that the synthesis of 
raffinose proceeds by a ping-pong mechanism. This 
mechanism explains isotopic exchange between raffinose 
and labelled sucrose, which has been shown to be 
associated with raffinose synthase activity (Lehle and 
Tanner 1973; Castillo et al. 1990). Remarkably, pea 
raffinose synthase was able to utilize D-ononitol and 
D-pinitol as acceptors (Table 2). Galactosyl transfer to 
these methylated inositols is similar to an exchange re- 
action between galactinol and mjo-inositol, which is also 
expected to occur in a ping-pong mechanism. However, 
it has so far been believed that these reactions are ex- 
clusively catalyzed by stachyose synthases (Peterbauer 
and Richter 2001). Our results indicate that the ability to 
utilize various inositol derivatives is a more common 
feature of enzymes of the raffinose oligosaccharide 
pathway. 

Direct evidence for hydrolytic activity of raffinose 
synthase towards galactinol could not be provided, be- 
cause we were unable to purify recombinant protein. 
However, indirect support came from steady-state ki- 
netic analysis of galactinol hydrolysis catalyzed by the 
partially purified protein (Fig. 6). A dual role of raffi- 
nose synthase as a transglycosidase with some hydrolytic 
activity is in line with a classification of raffinose syn- 
thase as a glycoside hydrolase of family 36. Members of 
a family are likely to share structural topology of the 
active site and a common catalytic mechanism, although 
overall sequence homologies may be weak (Henrissat 
and Davies 2000). Indeed, all a-galactosidases so far 
studied, as well as stachyose synthases of family 36, have 
been shown to operate by a ping-pong mechanism 
(Peterbauer and Richter 1998; Brumer et al. 1999; Van 
Laere et al. 1999; Peterbauer et al. 2002). It is interesting 
to note that the amino acid residue, which forms the 



Table 3. Inhibition patterns and inliibition constants for partially initial rate equations as described in Materials and methods. C 
purified raffinose synthase from maturing pea seeds. Inhibition Linear competitive inhibition, M linear mixed (non-competitive) 
constants were determined by fitting the data to the corresponding inhibition, hU hyperbolic uncompetitive inhibition 



Variable substrate 


Product 


Inhibitor 


Pattern 
















(mM) 


(mM) 


(mM) 


Sucrose 


Raffinose 


m^io-Inositol 


C 


10.1 ±0.9" 






Galactinol 


Raffinose 


/jjj'o-Inositol 


M 


22.3±4.l'' 


23.2 ±2.6'' 




Galactinol 


Galactose 


Sucrose 


M 


3.7±0.5 


22.8 ±2.8 




Galactinol 


OT^-o-inositol 


Sucrose 


hU 




18.7±1.3 


2.8±0.2 



"Apparent nihibition constant determined in the presence of 10 mM galactinol 
''Apparent mhibition constant determined in the presence of 20 mM sucrose 




1 /[Galactinol] (mM-') 

Fig. 4. Inhibition of partially purified raffinose synthase from 
maturing pea seeds by 1-deoxygalactonojirimycin. Assays con- 
tained 2-20 mM galactinol and 20 mM sucrose. The concentra- 
tions of 1-deoxygalactonojirimycin were 0.0 \^M {filled circles), 
0.1 |iM {open circles), 0.5 nM {filled squares) or 2.0 |iM {open 
squares). Lines represent the best fit to Eq. 5 




1 /[Galactinol] (mM"') 

Fig. 6a, b. Initial velocity patterns for hydrolysis of galactinol by 
partially purified raffinose synthase from maturing pea seeds in the 
presence of several fixed concentrations of sucrose, a Release of 
galactose. Lines represent the best fit to Eq. 6. b Formation of myo- 
inositol. Lines represent the best fit to Eq. 7. The concentrations of 
sucrose were 0 mM {filled circles), 10 mM {open circles), 13 mM 
{filled squares), 20 mM {open squares), 40 mM {filled triangles) or 
80 mM {open triangles) 




Fig. 5. Effect of pH on the formation of raffinose {filled circles) 
and release of galactose {open circles) in reaction mixtures 
containing partially purified raffinose synthase from maturing pea 
seeds, 10 mM galactinol and 20 mM sucrose in Mcllvaine buffer 
(0.2 M NazHPO^ adjusted to various pH values with 0.1 M citric 
acid). Data were adjusted relative to the maximum activity 
measured 

covalent intermediate in a-galactosidases of tlie related 
glycoside hydrolase family 27, is conserved among 
members of family 36 (Fig. 1). 

Two other biochemical properties of the enzyme 
further support a structural similarity of the active site 
of raffinose synthase and a-galactosidases. Like the 
corresponding enzyme from Vicia faba (Lehle and 




Time (h) 

Fig. 7a-c. Equilibrium of raflSnose synthesis. Partially purified 
raffinose synthase from pea seeds was incubated at 30 °C with 
40 mM galactinol and 40 mM sucrose {filled circles), 20 mM of 
each galactinol, mj'o-inositol, sucrose and raffinose {squares), or 
40 mM mj'o-inositol and 40 niM raffinose {open circles), respec- 
tively. Aliquots were removed at the times indicated and analyzed 
by HPLC. Changes in the levels of substrates and products are 
shown for raffinose (a), /w^'o-inositol (b) and galactose (c) 



Tanner 1973), pea raffinose synthase utilized p-mixo- 
phenyl a-D-galactopyranoside, an artificial substrate 
with high affinity towards a-galactosidases, as a galac- 
tosyl donor (Table 2). The enzyme was also strongly 
inhibited by 1-deoxygalactonojirimycin (Fig. 4), a po- 
tent competitive inhibitor for a-galactosidases (Asano 
et al. 2000; Martin et al. 2001). These findings may be of 
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more general relevance. A large number of a-galacto- 
sidases with acidic or alkaline pH optima have been 
identified using nitrophenyl derivatives as substrates (for 
reviews, see Dey 1985; Peterbauer and Richter 2001). 
With the exception of a few a-galactosidases of bacterial 
origin (van den Broek et al. 1999; Van Laere et al. 1999), 
none of these enzymes has been rigorously tested for 
transglycosidase activities with natural substrates, be- 
cause it is tacitly assumed that proteins, which act on 
nitrophenyl glycosides in vitro, function as hydrolases in 
vivo. Our results suggest that this assumption may be 
misleading. Raffinose synthase, for example, would be 
recognized as neutral a-galactosidase if assayed only 
with jj-nitrophenyl galactopyranoside or galactinol. 
However, its hydrolytic activity is probably of little 
physiological significance, because it is strongly inhibited 
by sucrose (Fig. 6a), which is usually present in high 
concentration in plant tissues. 

Inhibition of raffinose synthase by 1-deoxygalacton- 
ojirimycin is of particular interest, because this inhibitor 
has been successfully used to modulate activity of 
a-galactosidase in human lymphoblasts (Asano et al. 
2000). Hence, it could be possible to use 1-deoxygalac- 
tonojirimycin or related a-galactosidase inhibitors to 
experimentally manipulate the content of raffinose oli- 
gosaccharides in vivo. A more detailed characterization 
of enzymes involved in raffinose oligosaccharide syn- 
thesis with respect to inhibition by 1-deoxygalactonoj- 
irimycin is currently in progress. 
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Deciphering the Message in Protein Sequences: 
Tolerance to Amino Acid Substitutions 

James U. Bowie/ John F. Re idhaar- Olson, Wendell A Lim 
Robert T. Sauer 



An amino acid sequence encodes a message that deter- 
mines the shape and function of a protein. This message is 
highly degenerate in that many different sequences can 
code for proteins with essentiaUy the same structure and 
activity. Comparison of different sequences with similar 
messages can reveal key features of the code and improve 
understanding of how a protein folds and how it per- 
forms its function. 



THE GENOME IS MANIFEST LARGELY IN THE SET OF PRO- 
tcins tliat it encodes. It is the ability of these proteins to fold 
into unique three-dimensional structures diar allows them to 
fonction and carrj' out the instructions of die genome. Thus 
comprehending die rules diat relate amino acid sequence to struc- 
ture is fundamental to an understanding of biological processes 
Because an amino add sequence contains all of die infomiation 
necessary to determine die structure of a protein (/), it should be 
possible to predict structure from sequence, and .subsequendy to 
infer detailed aspects of function from die structure. However, both 
problems are extremely complex, and it seems unlikely diat either 
will be solved in an exact manner in die near flinire. It may be 
possible to obtain approximate solutions by using experimental data 
to simplify' the problem. In diis article, we describe how an analysis 
of allowed amino acid substitutions in proteins can be used to 
reduce the complexity of sequences and reveal important aspects of 
stnirnii-e and function. 



Methods for Studying Tolerance to 
Sequence Variation 

There are two main approaches to studving the tolerance of an 
amino acid sequence to change. The first method relies on the 
process of evolution, in which mutations are either accepted or 
rejected by natural selection. This mediod has been extremely 
powerful for proteins such as die globins or cytochromes, for which 
sequences from many diflFercnt species are known (2-7). The second 
approach uses genetic mcdiods to introduce amino acid changes at 
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specific positions in a cloned gene and uses selections or screens to 
identify functional sequences. This approach has been used to great 
advantage for proteins that can be expressed in bacteria or yeast, 
where the appropriate genetic manipulations arc po,ssible (,?, 5-; /)' 
The end results of both mediods are lists of active sequences diat can 
be compared and analyzed to identify sequence features that are 
essential for folding or function. If a particular propeit)' of a side 
chain, such as charge or size, is important at a given position only 
side chains that have die required property- will be allowed. Con- 
versely, if the chemical identity of the side chain is unimponant, 
dien many different subsritudons will be permitted. 

Snidies in which these mediods were used have revealed that 
proteins are surprisingly tolerant of amino acid substitutions {2-4 
11). Hor example, m studying the effects of approximateh' 1500 
single amino acid substitutions at 142 positions in he repressor 
Miller and co-workers found that about one-half of all substitutions 
were phenotypically silent (11). At some positions, many different 
nonconservative substitutions were allowed. Such residue positions 
play litrie or no role in stnicnire and function. At other positions, no 
substiturions or only conservative subsdtutions were allowed. These 
residues are the most important for lac repressor aaivity. 

What roles do invariant and conserved side chains play in 
proteins? Residues diat are directly involved in protein functions 
such as binding or catalysis will certainly be among the most 
conserved. For example, replacing the Asp in die catalytic triad of 
tr)'psin with Asn results in a lO-t-fold reduction in activity (72). A 
similar loss of activity occurs in X repressor when a DNA binding 
residue is changed from Asn to Asp (13). To earn' out their 
function, however, these catalytic residues and binding residues 
must be precisely oriented in three dimensions. Consequendy, 
mutations in residues that are required for structure formation or 
stability can also have dramatic effects on accivir>' {10, 14-16). 
Hence, many of die residues that are conserved in sets of related 
sequences play structural roles. 



Substitutions at Surface and Buried Positions 

In dieir initial comparisons of the globin sequences, Pcrutz and 
co-workers found that most buried residues require nonpolar side 
chains, whereas few feanires of surface side chains are generally 
conserved {6). Similar results have been seen tor a number of protein 
families (2, 4, 5, 7, 77, 18). An example of the sequence tolerance at 
surface versus buried sites can be seen in Fig. 1, which shows the 
allowed substitutions in \ repressor at residue positions that are near 
die dimer interface but distant from die DNA binding surface of the 
protein (9). These subsdmtions were identified by a flinctional 
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the appropriate hydrophobic residues, a significant fracti 
acceptable. Hence, the hydrophobicity of a sequence 
more information about its potential acceptability in the core than 
docs the total side chain volume. Steric compatibility was intermedi- 
ate bet^veen volume and hydrophobicit)' in informational impor- 



The Informational Importance of Surface Sites 

We have noted that many surface sites can tolerate a wide variety 
of side chains, including hydrophilic and hydrophobic residues. This 
result might be taken to indicate that surface positions contain little 
structural information. However, Bashford e( «/., in an extensive 
analysis of globin sequences {4), found a strong bias against large 
hydrophobic residues at many surface positions. At one level, this 
may reflect constraints imposed by protein solubility, because large 
patches of hydrophobic surface residues would presumablv lead to 
aggregation. At a more fundamental level, protein folding requires a 
partitioning between surface and buried positions. Consequendy, to 
achieve a unique native state without significant competition from 
other conformations, it may be imponant diat some sites ha\'e a 
decided preference for exterior radier than interior positions. As a 
result, many surface sites can accept hydrophobic residues individ- 
ually, but the surface as a whole can probably tolerate only a 
moderate number of hydrophobic side chains. 



Implications for Structure Prediction 

At present, the only reliable mediod for predicting a low- 
resolution tertiary srnicnire of a new protein is by identif\'ing 
sequence similarity to a protein whose stmcture is already kswwn 
(29, 30). However, it is often difficult to align sequences as'die lc\ el 
of sequence similarity decreases, and it is sometimes impossible to 
detect statistically significajit sequence similarit)' benveen distantly 
related proteins. Because the number of known sequences i.s f.ir 
greater dian die number of known structures, it would be advanta- 
geous to increase the reach of die available structural information by 
improving mediods for detecting distant sequence rcladons and for 
subsequently aligning diese sequences based on structural principles. 
In a normal homology search, die sequence database is scanned with 
a single test sequence, and every residue must be weighted equally. 
However, some residues are more important than odiers and should 
be weighted accordingly. Moreover, certain regions of the protcui 
are more likely to contain gaps than others. Both kinds of informa- 
tion can be obtained from sequence sets, and several tecliniques have 




Identification of Residue Roles from 
Sets of Sequences 

Often, a protein of interest is a member of a family of related 
sequences. What can we infer from the pattern of allowed substitu- 
tions at positions in sets of aligned sequences generated by genetic 
or phylogcnetic methods! Residue positions that can accept a 
number of different side chains, including charged and highly polar 
residues, are almost certain to be on the protein surface. Residue 
psitions that remain hydrophobic, whether variable or not, are 
likely to be buried within the stnicnire. In Fig. 3, those residue 
positions in X repressor that can accept hydrophilic side chains arc 
shown in orange and those that cannot accept hydrophilic side 
chains are shown in green. The obligate hydrophobic posirions 
define the core of the structure, whereas positions that can accept 
hydrophilic side chains define the surface. 

Functionally important residues should be conserved m sets of 
aaive sequences, but it is not possible to decide whether a side chain 
is functionally or structurally important just because it is invariant or 
coaserved. To make this distinction requires an independent assay of 
protein folding. The ability of a mutant protein to maintain a stably 
folded structure can often be measured by biophysical techniques, 
by susceptibility' to intracellular proteolysis [26), or by binding to 
antibodies specific for the native strucTure (27, 2fl). In the latter 
cases, it is possible to screen proteins in mutated clones for the 
ability to fold even if these proteins are inactive. Sets of sequences 
that allow formation of a stable structure can then be compared to 
the sets that allow both folding and fimction, with the active site or 
binding residues being those that are variable in the set of stable 
proteins but invariant in the set of fijnctional proteins. The DNA- 
binding residues of Arc repressor were identified by this method (8). 
The receptor-binding residues of human growth hormone were also 
identified by comparing the .stabiiiries and activities of a set of 
mutant sequences (28). However, in this case, the mutants were 
generated as hybrid sequences between growth hormone and related 
hormones with different binding specificities. 
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Fig. 5. A representation of one com- 
pact conformation for a particular 
sequence of H and P residues on a 
two-dimensional square lattice. 
[Adapted from (40), with permis- 
sion of the American Chemical Soci- 
ety) 
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surface positions. The amphipathic patterns that emerge can be used 
to identify probable regions of secondary structure. Third, incorpo- 
rating a knowledge of allowed substitutions can improve the ability 
to dctca and align distantly related proteins because die essential 
residues can be given prominence in die alignment scoring. 

As more sequences are determined, it becomes increasingly likely 
that a protein of interest is a member of a family of related 
sequences. If diis is not die case, it is now possible to use genetic 
mediods to generate lists of ailowed amino acid substitutions. 
Consequently, at least in the short term, it may not be necessary to 
solve the folciing problem for individual protein sequences. Instead, 
information from sequence sets could be used. Perhaps by simplify- 
ing sequence space through the identification of key residues, and by 
simplifying conformation space as in the lattice methods, it will be 
possible to develop algorithms to generate a limited number of trial 
structures. These trial structures could then, in turn, be evaluated by 
further experiments and more sophisticated energy calculation.s. 
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If you hava eeveral similar nucleic acid or protein sequences it is often useful to align corresponding bases or amino acids in 
columns. For instance, you might wish to group bases or amino acids that occupy similar positions in the throe-dimenaional 
structure which exercise similar functions or that have evolved by substitution from the same base or amino acid in an ances 
sequence. In the latter case you might also like to construct a phylogenetic tree. 



1. Global alignments. 

The Neadleman and Wunsch algorithm for finding the best global alignment of two sequences can readily be extended to multiple 
sequences, The problem is that the time the computer needs for such a job Is roughly proportional to the product of the 
sequence lengths. So, if aligning two sequences of 300 positions takes 1 second, aligning 3 sequences takes 300 seconds and 
aligning 10 sequences would take 300"8 seconds, which is longer than the lifetime of the universe! 

Since searching for a best global alignment using a rigorous algorithm is not realistic for more than three sequences, a number of 
strategies have been developed to carry out a multiple global alignment In a reasonable amount of time with a reasonable chance 
of finding the best alignment The GOG program pileup first aligns all possible pairs of sequences according to Needleman and 
Wunsch (for n sequences, this makes n*(n-1)/2 alignments). Then it uses the pairwise similarity scores to construct a tree using 
the UPQMA method (see below). Finally, this tree sen/es as a guide for a progressive multiple alignment starting from the tips. 
Once two sequences have been aligned, their relative alignment Is no longer changed. Clusters of previously aligned sequences 
are treated as a linearly weighted profile when they are subsequently aligned with another sequence or another cluster. 



Other approaches include: 

. The very popular CLUSTAL program differs only from pileup in that it performs the initial pairwise alignments using the j ^ 
fast algorithm of Wilbur and Upman. CABIOS 8:189 (1992). References ) 
you can obtain versions of CLUSTAL for UNIX and for VAX 

. Starting with a search for words of n bases or amino acids that are common between the sequences. An example is 
Martin Vingron's program MALI CABIOS 5:1 15 (1989). References. 

MALI is not distributed freely but may be obtained from its author Martin Vingron (vingron@emb|-heidelberg.de ) 
. PIMA uses pattern-matching, rather then profile matching, while making the progressive alignment. PNAS 87:118 (1990) 
References 

PIMA can be obtained for UNIX and for VAX 
. Building a phylogenetic tree, using a more elaborate algorithm, aa the sequences are progressively aligned. An example is 
Jotun Hein's program TreeAlign. Meth.Enzymol. 18:626(1990) 

TreeAlign can be obtained for UNIX and VMS from the same address as given for Clustal (see above) 
. Making the best multiple alignment in a limited area of alignment space. This can only raalistioally be performed with eight 
to ten sequences. 



2. Local alignments. 

There are oases where sequences share a similar region but are otherwise completely different. Take, for example, the amino 
acids in the active site of an enzyme or transcription factor binding sites in a DNA sequence. To handle these cases local 
multiple alignment algorithms have been developed. Usually they only look for ungapped alignments thereby avoiding the problem 
of choosing the optimal gap penalty. Two such programs have been developed at the NCBI : 

MACAW by Schuler, Altsohul and Lipman first tries to find high scoring segment pairs (HSPs) for each possible pair of sequences 
using the BLAST algorithm (with the sensitivity set high). It then assembles overiapping HSPs into blocks. An interesting feature 
of MACAW is that it does not try to align all sequences, but can pick out only those that share similar regions. Proteins 9:1 80 

Therl are'vlrs^ons of MACAW obuinable for the PC undte- Windows and for the Mac. 
the MACAW distribution also cohtalnS Qibbs (see below) and a pattsrn searchfer. 

The Gibbs sampler algorithm involves iterativcly making a profile with stretches of n bases or amino acids, selected from the 
sequences, and then searches this profile against one of tlio sequenoos. The result of the search is used to weight the selection 
of the stretches at the next run. A drawback is that the user must choose the width n and the number of elements in each 
sequence and thus must have a certain idea of the outcome, or run the proeram several times. An interesting feature is that the 
Gibbs sampler algorithm avoids the choice of an externally added scoring scheme since it derives the highest scoring profile, in a 
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self-consistent manner, from the data. Science 262:208 (1S92). References . 
Gibbs for is available for UNIX. 

3. blasts. 

It is also worth mentioning the program blasts. This searches a protein against a protein databank using the B1_AST algorithm 
(with the sensitivity set high) and then makes threefold alignments between the query sequence and each possible pair of 
databank sequences that have been found. Only the statistically significant threefold alignments which are mads from three 
nonsignificant pairwise alignments are retained. biastS is useful in finding proteins that share a region of only wesak similarity. 
Occasionally it can show that a query sequence makes the bridge between two databank sequences whose relationship had not 
yet been suspected. 
You can look at the Manual . 

It is possible to access a BLAST (Including blasts) server at the NGBI. either through WWW or witJi a specific blast Internet client 
that you can install on your computer. More INFO is available. 

4. phylogenetic trees. 

Ideally a researcher would like to have a black box in which to throw sequences and get out a fully annotated phylogenetic tree. 
This is, however, not possible for two reasons. First, an algorithm that considers all possible multiple sequence alignments and 
then, for each alignment, all possible phylogenetic trees and picks out the best one, would take too much time. That is why most 
phylogenetic programs wori< on previously aligned sequences. Second, the result is always strongly influenced by the criteria that 
are used to define the best tree. Phylogenetic analysis will be the subject of a separate column in a later issue of embnctnews. 
However, a few remarks seem appropriate here. There are three main kinds of tree buildinfi methods: distance matrix, maximum 
likelihood and parsimony. 

Distance matrix methods first estimate the pairwise distances between the sequences (which means that the information in the 
alignment of two sequences is reduced to one number) while the other methods construct many trees from all the information in 
the multiple alignment and decide which is best 

The simplest distance based method is UPGMA (unweighted pair-group method using arithmetic averages) which involves 
iteratively taking together the two sequences that have the shortest distance from each other, placing them at the end of 
branches on a node of the tree, and replacing their distances from the other sequences by an average value. 

The guide tree used by piloup and CLUSTAL should never be used to infer phylogeny! It has been derived from the distances 
between pairwise aligned sequences and these distances are not necessarily the same as the distances between sequence pairs 
taken fh^m the multiple sequence alignment 
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